Skip to content

Executing Plans

In this section, we are going to illustrate a number of ways on how you will be using our plan module.

Launching mode concepts

Since plan was designed for being resilient, a number of launching concepts had to be introduced to meet operational users requirements in a production environnment. Namely:

  • WITH_LAST_COMMITTED_FROM_START_TO_END_DATE

    activated when a start date and stop date is defined in plan_settings

  • WITH_LAST_COMMITTED_FROM_START_DATE

    activated when only a start date is defined in plan_settings (in this mode, date of last successful execution takes priority if exists.)

  • WITH_LAST_COMMITTED_TO_END_DATE

    activated when only an end date is defined in plan_settings (in this mode, date of last successful execution is taken into account)

  • WITH_LAST_COMMITTED

    activated when neither start date nor end date is defined in plan_settings (in this mode, recovery date of last sucessful execution is taken into account)

  • WITHOUT_LAST_COMMITTED

    activated when plan is launched without --last-committed argument by shiva

Plan LOG types

  • PLAN_FIRST_EXECUTION

    When either no recovery index or document was found with required plan information

  • PLAN_RESUMED_FROM_CHECKPOINT

    When a recovery document is found to containing needed meta data for our plan to resume from

  • PLAN_IGNORED_LAST_COMMITTED_START_DATE

    When plan is launched in WITH_LAST_COMMITTED_FROM_START_DATE

  • PLAN_UPTIME

    Plan metrics where you can find the uptime value (in seconds) of the plan process

Job LOG types

  • JOB_STARTED

    When a job executed by a plan begins

  • JOB_ENDED

    When a job executed by a plan ends

  • JOB_FAILED

    When a job executed by a plan fails to end

  • ERROR

    When something unexpected happened

This section explains how to launch punchline plans in practice. For that a list of useful commands to keep in mind will be describe below.

It is important to have basic understanding of Spark concepts. If it's not the case, please go through the Plan concepts section first.

Example LOG structure

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
{
  "content": {
    "unit": "seconds",
    "event_type": "PLAN_UPTIME",
    "job_runtime_id": "5b5ffef8-eca4-45e0-a159-2738839f14c7",
    "level": "INFO",
    "rate": 10,
    "logger": "org.thales.punch.plan.api.PlanLauncher",
    "channel": "default",
    "launch_mode": "WITH_LAST_COMMITTED",
    "plan.runtime.id": "default",
    "start_date": "2019-11-15T14:57:15.809Z",
    "uptime": 39
  },
  "target": {
    "cluster": "foreground",
    "type": "spark"
  },
  "init": {
    "process": {
      "name": "plan_controller",
      "id": "5624@PUNCH-LPT35-003"
    },
    "host": {
      "name": "PUNCH-LPT35-003"
    },
    "user": {
      "name": "jonathan"
    }
  },
  "platform": {
    "channel": "default",
    "id": "default",
    "job": "no_shiva",
    "tenant": "default"
  },
  "type": "punch",
  "vendor": "thales",
  "@timestamp": "2019-11-15T14:57:15.809Z"
  },
  "fields": {
    "@timestamp": [
      "2019-11-15T14:57:15.809Z"
    ],
    "content.start_date": [
      "2019-11-15T14:57:15.809Z"
    ]
  }
}

Commons fields:

  • start_date: refers to the date when the plan or a PunchLine was executed
  • job_runtime_id: runtime_id of the PunchLine executed by the plan. The runtime_id (generated if not set by the user) is unique as long as the plan process is running. In case the plan dies and resumes afterwards, a new runtime_id will be affected to all the PunchLine which will be executed by our plan.
  • channel: name of the channel the plan is located
  • ...

Executing a channel containing a plan job with shiva

Why shiva ?

Plans guarantees the resiliency of executing jobs whereas shiva guarantees the resiliency of plans.

Channel structure example

Directory structure

1
2
3
4
5
tree $PATH_TO_CHANNELS_DIR/my_plan1
├── channel_structure.json
├── job.template
├── plan.hjson
└── README.md

Channel structure configuration example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
{
    "version": "5",
    "start_by_tenant": false,
    "stop_by_tenant": true,
    "jobs": [
        {
            "type": "shiva",
            "name": "plan-example-1",
            "command": "plan",
            "args": [
                "--plan", "plan.hjson",
                "--template", "job.template",
                "--deploy-mode", "foreground",
                "--last-committed"
            ],
            "resources": [
                "plan.hjson",
                "job.template"
            ],
            "cluster": "common",
            "shiva_runner_tags": [
                "standalone"
            ]
        }
    ]
}

Executing a plan in a channel structure

1
punchctl start --channel my_plan1

Stopping a plan in a channel structure

1
punchctl stop --channel my_plan1

Executing a plan with punchline shell

Example File Configuration

In addition to jobs, the PunchPlatform punchline client lets you also execute Plans.

In brief, a Plan provides a way of executing jobs periodically with the help of templating.

Use case Fetching data from an elasticsearch cluster every 10 minutes.

To accomplish this, we will make use of the two files below: - a plan - a job template.

Definition You can think of the plan as a file that will holds all the metadata needed for your job. For instance, variables... And as for the job template, consider it as being the file describing your use case needs. In other words, a job template relies on a plan for scheduling information (intervals, etc...), sink metadata (cluster name, ports, etc...). Achieving a periodic job execution is thus made available with the help of templating.

Note

Jobs and Plans are compatible with both JSON and HJSON

Note

In a production environment, a plan can keep track of it's last successful event and resume from it in case of failure. Use --last-committed to activate this option. This feature is only available when the execution of your plan is made within a channel.

Note

In a production environment, you can launch a plan between two specifics data with the --date parameter. Each date in string format should be delimited by ,as separator. Date format has to be: yyyy-MM-dd'T'HH:mmZ. This feature is only available when the execution of your plan is made within a channel.

Let's use some file examples to be clearer:

plan.hjson:

With the help of templating, every minutes, a job template is generated with the appropriate metadata and then launch as a PML job.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
{
  model:{
    cluster_name: es_search
    index_input: platform-metricbeat-*
    nodes: ["localhost"]
    dates: {
      day: {
        offset: -PT1m
        format: yyyy.MM.dd
      }
      from: {
        offset: -PT1m
        format: yyyy-MM-dd'T'HH:mmZ
      }
      timezone: {
        format: +00:00
      }
      to: {
        format: yyyy-MM-dd'T'HH:mmZ
      }
    }
  }
  plan_settings: {
    cron: "*/1 * * * *"
    delay: PT10s
  }
}

job-template.hjson:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
{
  job: [
    {
      component: input
      publish: [
        {
          stream: data
        }
      ]
      settings: {
        cluster_name: "{{cluster_name}}"
        source_column: source
        index: "{{index_input}}-{{day}}"
        nodes: {{ nodes | json_encode() }}
        query: {
          range: {
            @timestamp: {
              gte: "{{from}}"
              lt: "{{to}}"
              time_zone: "{{timezone}}"
            }
          }
        }
      }
      type: elastic_batch_input
    }
    {
      component: show
      settings: {
      }
      subscribe: [
        {
          component: input
          stream: data
        }
      ]
      type: show
    }
  ]
}

How to launch a Plan

Plans default behaviour

In case a plan is executed in --last-committed mode and no persistence key is found in your plan configuration file:

  • default index name for cursor: platform-plan-cursor
  • default hostname for elasticsearch: localhost
  • default port for elasticsearch: 9200

For model key that is used to define templating variables, the following ones are used by our internal system and will be overriden if used:

  • name: defined in your plan configuration
  • tenant: defined in your plan configuration
  • channel: defined in your plan configuration
  • version: defined in your plan configuration

Using a file path

To execute a plan, you can run these command from your terminal:

1
2
3
4
5
6
7
8
# Default --runtime-environment is set to spark
# others: pyspark_5x (craig) and pyspark (dave)

# foreground/local
$ punchlinectl --plan <plan_path> --template <template_path>

# one a cluster
$ punchlinectl --plan <plan_path> --template <template_path> --deploy-mode cluster --spark-master <master_url>