Skip to content

Executing Plans

In this section, we are going to illustrate a number of ways on how to use our plan module.

Launching mode concepts

Since plan was designed for being resilient, a number of launching concepts had to be introduced to meet operational users requirements in a production environment. Namely:

  • WITH_LAST_COMMITTED_FROM_START_TO_END_DATE

    activated when a start date and stop date is defined in plan_settings

  • WITH_LAST_COMMITTED_FROM_START_DATE

    activated when only a start date is defined in plan_settings (in this mode, date of last successful execution takes priority if exists.)

  • WITH_LAST_COMMITTED_TO_END_DATE

    activated when only an end date is defined in plan_settings (in this mode, date of last successful execution is taken into account)

  • WITH_LAST_COMMITTED

    activated when neither start date nor end date is defined in plan_settings (in this mode, recovery date of last successful execution is taken into account)

  • WITHOUT_LAST_COMMITTED

    activated when a plan is launched without --last-committed argument

Plan LOG types

  • PLAN_FIRST_EXECUTION

    When neither recovery index nor document is found with required plan information

  • PLAN_RESUMED_FROM_CHECKPOINT

    When a recovery document is found containing needed meta data for our plan to resume from

  • PLAN_IGNORED_LAST_COMMITTED_START_DATE

    When plan is launched in WITH_LAST_COMMITTED_FROM_START_DATE

  • PLAN_UPTIME

    Plan metrics where you can find the uptime value (in seconds) of the plan process

Job LOG types

  • JOB_STARTED

    When a job executed by a plan begins

  • JOB_ENDED

    When a job executed by a plan ends

  • JOB_FAILED

    When a job executed by a plan fails to end

  • ERROR

    When something unexpected happened

This section explains how to launch punchline plans in practice. For that a list of useful commands to keep in mind which can be found below.

It is important to have basic understanding of Spark concepts. If it's not the case, please go through the Plan concepts section first.

Example LOG structure

{
  "content": {
    "unit": "seconds",
    "event_type": "PLAN_UPTIME",
    "job_runtime_id": "5b5ffef8-eca4-45e0-a159-2738839f14c7",
    "level": "INFO",
    "rate": 10,
    "logger": "org.thales.punch.plan.api.PlanLauncher",
    "channel": "default",
    "launch_mode": "WITH_LAST_COMMITTED",
    "plan.runtime.id": "default",
    "start_date": "2019-11-15T14:57:15.809Z",
    "uptime": 39
  },
  "target": {
    "cluster": "foreground",
    "type": "spark"
  },
  "init": {
    "process": {
      "name": "plan_controller",
      "id": "5624@PUNCH-LPT35-003"
    },
    "host": {
      "name": "PUNCH-LPT35-003"
    },
    "user": {
      "name": "jonathan"
    }
  },
  "platform": {
    "channel": "default",
    "id": "default",
    "job": "no_shiva",
    "tenant": "default"
  },
  "type": "punch",
  "vendor": "thales",
  "@timestamp": "2019-11-15T14:57:15.809Z"
  },
  "fields": {
    "@timestamp": [
      "2019-11-15T14:57:15.809Z"
    ],
    "content.start_date": [
      "2019-11-15T14:57:15.809Z"
    ]
  }
}

Commons fields:

  • start_date: refers to the date when the plan or a PunchLine was executed
  • job_runtime_id: runtime_id of the PunchLine executed by the plan. The runtime_id (generated if not set by the user) is unique as long as the plan process is running. In case the plan dies and resumes afterwards, a new runtime_id will be affected to all the PunchLine which will be executed by our plan.
  • channel: name of the channel the plan is located
  • ...

Executing a channel containing a plan job with shiva

Why shiva ?

Plans guarantees the resiliency of executing jobs whereas shiva guarantees the resiliency of plans.

Channel structure example

Directory structure

tree $PATH_TO_CHANNELS_DIR/my_plan1
├── channel_structure.json
├── job.template
├── plan.hjson
└── README.md

Channel structure configuration example

{
    "version": "5",
    "start_by_tenant": false,
    "stop_by_tenant": true,
    "jobs": [
        {
            "type": "shiva",
            "name": "plan-example-1",
            "command": "plan",
            "args": [
                "--plan", "plan.hjson",
                "--template", "job.template",
                "--deploy-mode", "foreground",
                "--last-committed"
            ],
            "resources": [
                "plan.hjson",
                "job.template"
            ],
            "cluster": "common",
            "shiva_runner_tags": [
                "standalone"
            ]
        }
    ]
}

Executing a plan in a channel structure

channelctl start --channel my_plan1

Stopping a plan in a channel structure

channelctl stop --channel my_plan1

Executing a plan with punchlinectl shell

Example File Configuration

In addition to jobs, the punchline client have the ability to execute Plans.

In brief, a Plan provides a way of executing jobs periodically with the help of templating.

Use case Fetching data from an elasticsearch cluster every 10 minutes.

To accomplish this, we will make use of the two files below: - a plan - a job template.

Definition You can think of the plan as a file that will holds all the metadata needed for your job. For instance, variables... And as for the job template, consider it as being the file describing your use case needs. In other words, a job template relies on a plan for scheduling information (intervals, etc...), sink metadata (cluster name, ports, etc...). Achieving a periodic job execution is thus made available with the help of templating.

Note

Jobs and Plans are compatible with both JSON and HJSON

Note

In a production environment, a plan can keep track of it's last successful event and resume from it in case of failure. Use --last-committed to activate this option. This feature is only available when the execution of your plan is made within a channel.

Let's view some examples to be more concrete:

plan.hjson:

With the help of templating, every minutes, a job template is generated with the appropriate metadata and then launch as a punchline job.

{
  model:{
    cluster_name: es_search
    index_input: platform-metricbeat-*
    nodes: ["localhost"]
    dates: {
      day: {
        offset: -PT1m
        format: yyyy.MM.dd
      }
      from: {
        offset: -PT1m
        format: yyyy-MM-dd'T'HH:mmZ
      }
      timezone: {
        format: +00:00
      }
      to: {
        format: yyyy-MM-dd'T'HH:mmZ
      }
    }
  }
  plan_settings: {
    cron: "*/1 * * * *"
    delay: PT10s
  }
}

job-template.hjson:

{
  job: [
    {
      component: input
      publish: [
        {
          stream: data
        }
      ]
      settings: {
        cluster_name: "{{cluster_name}}"
        source_column: source
        index: "{{index_input}}-{{day}}"
        nodes: {{ nodes | json_encode() }}
        query: {
          range: {
            @timestamp: {
              gte: "{{from}}"
              lt: "{{to}}"
              time_zone: "{{timezone}}"
            }
          }
        }
      }
      type: elastic_batch_input
    }
    {
      component: show
      settings: {
      }
      subscribe: [
        {
          component: input
          stream: data
        }
      ]
      type: show
    }
  ]
}

How to launch/manage a Plan

Plans default behaviour

In case a plan is executed in --last-committed mode and no persistence key is found in your plan configuration file:

  • default index name for cursor: platform-plan-cursor
  • default hostname for elasticsearch: localhost
  • default port for elasticsearch: 9200

For model key that is used to define templating variables, the following ones are used by our internal system and will be overridden if used:

  • name: defined in your plan configuration
  • tenant: defined in your plan configuration
  • channel: defined in your plan configuration
  • version: defined in your plan configuration

Launch a plan

To execute a plan, you can run these command from your terminal:

# Default --runtime-environment is set to spark
# others: pyspark_5x (craig) and pyspark (dave)

# foreground/local
$ punchlinectl --plan <plan_path> --template <template_path>

# one a cluster
$ punchlinectl --plan <plan_path> --template <template_path> --deploy-mode cluster --spark-master <master_url>

Manage a plan cursor

A cli is provided to manage plan cursors, namely: punchline-plan-manager.

Prerequisites

plan-id generation rule

As you may already know, plan --last-committed is only supported when used with shiva.

# a composition of tenant, channel name and plan name
$TENANT="mytenant"
$CHANNEL="aggregation"
$PLAN_NAME="myplan"

PLAN_ID="$TENANT_$CHANNEL_$PLAN_NAME"

# echo $PLAN_ID
mytenant_aggregation_myplan

es-cluster

Es cluster is taken from punchplatform.properties

Your es nodes that are set in your es_cluster must all be discoverable by the machine where you will be using punchline-plan-manager.

commands

With this command, you can:

  • view the cursor meta data that the plan will be using to restart from
  • change the cursor that a plan will be using to restart from
# to view cursor meta data of a specif plan id
punchline-plan-manager --list-cursor --plan-id myplanid --tenant mytenant --es-cluster es_search --index-name platform-plan-cursor

# to change the cursor that a plan will be using to restart from
punchline-plan-manager --reset-cursor --plan-id myplanid --tenant mytenant --restart-date 2020-01-01T12:12:29.771Z --es-cluster es_search --index-name platform-plan-cursor