Executing Plans
In this section, we are going to illustrate a number of ways on how you will be using our plan
module.
Launching mode concepts¶
Since plan was designed for being resilient, a number of launching concepts had to be
introduced to meet operational users
requirements in a production environnment. Namely:
- WITH_LAST_COMMITTED_FROM_START_TO_END_DATE
activated when a
start
date andstop
date is defined inplan_settings
- WITH_LAST_COMMITTED_FROM_START_DATE
activated when only a
start
date is defined inplan_settings
(in this mode, date of last successful execution takes priority if exists.) - WITH_LAST_COMMITTED_TO_END_DATE
activated when only an
end
date is defined inplan_settings
(in this mode, date of last successful execution is taken into account) - WITH_LAST_COMMITTED
activated when neither
start
date norend
date is defined inplan_settings
(in this mode, recovery date of last sucessful execution is taken into account) - WITHOUT_LAST_COMMITTED
activated when plan is launched without
--last-committed
argument byshiva
Plan LOG types¶
- PLAN_FIRST_EXECUTION
When either no recovery index or document was found with required plan information
- PLAN_RESUMED_FROM_CHECKPOINT
When a recovery document is found to containing needed meta data for our plan to resume from
- PLAN_IGNORED_LAST_COMMITTED_START_DATE
When plan is launched in
WITH_LAST_COMMITTED_FROM_START_DATE
- PLAN_UPTIME
Plan metrics where you can find the uptime value (in seconds) of the plan process
Job LOG types¶
- JOB_STARTED
When a job executed by a plan begins
- JOB_ENDED
When a job executed by a plan ends
- JOB_FAILED
When a job executed by a plan fails to end
- ERROR
When something unexpected happened
This section explains how to launch punchline plans in practice. For that a list of useful commands to keep in mind will be describe below.
It is important to have basic understanding of Spark concepts. If it's not the case, please go through the Plan concepts section first.
Example LOG structure¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 | { "content": { "unit": "seconds", "event_type": "PLAN_UPTIME", "job_runtime_id": "5b5ffef8-eca4-45e0-a159-2738839f14c7", "level": "INFO", "rate": 10, "logger": "org.thales.punch.plan.api.PlanLauncher", "channel": "default", "launch_mode": "WITH_LAST_COMMITTED", "plan.runtime.id": "default", "start_date": "2019-11-15T14:57:15.809Z", "uptime": 39 }, "target": { "cluster": "foreground", "type": "spark" }, "init": { "process": { "name": "plan_controller", "id": "5624@PUNCH-LPT35-003" }, "host": { "name": "PUNCH-LPT35-003" }, "user": { "name": "jonathan" } }, "platform": { "channel": "default", "id": "default", "job": "no_shiva", "tenant": "default" }, "type": "punch", "vendor": "thales", "@timestamp": "2019-11-15T14:57:15.809Z" }, "fields": { "@timestamp": [ "2019-11-15T14:57:15.809Z" ], "content.start_date": [ "2019-11-15T14:57:15.809Z" ] } } |
Commons fields:
- start_date: refers to the date when the plan or a PunchLine was executed
- job_runtime_id: runtime_id of the PunchLine executed by the plan. The runtime_id (generated if not set by the user) is unique as long as the plan process is running. In case the plan dies and resumes afterwards, a new runtime_id will be affected to all the PunchLine which will be executed by our plan.
- channel: name of the channel the plan is located
- ...
Executing a channel containing a plan job with shiva¶
Why shiva ?
Plans
guarantees the resiliency of executing jobs whereas shiva
guarantees the resiliency of plans
.
Channel structure example¶
Directory structure¶
1 2 3 4 5 | tree $PATH_TO_CHANNELS_DIR/my_plan1
├── channel_structure.json
├── job.template
├── plan.hjson
└── README.md
|
Channel structure configuration example¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | { "version": "5", "start_by_tenant": false, "stop_by_tenant": true, "jobs": [ { "type": "shiva", "name": "plan-example-1", "command": "plan", "args": [ "--plan", "plan.hjson", "--template", "job.template", "--deploy-mode", "foreground", "--last-committed" ], "resources": [ "plan.hjson", "job.template" ], "cluster": "common", "shiva_runner_tags": [ "standalone" ] } ] } |
Executing a plan in a channel structure¶
1 | punchctl start --channel my_plan1 |
Stopping a plan in a channel structure¶
1 | punchctl stop --channel my_plan1 |
Executing a plan with punchline shell¶
Example File Configuration¶
In addition to jobs, the PunchPlatform punchline client lets you also execute Plans.
In brief, a Plan provides a way of executing jobs periodically with the help of templating.
Use case Fetching data from an elasticsearch cluster every 10 minutes.
To accomplish this, we will make use of the two files below: - a plan - a job template.
Definition You can think of the plan as a file that will holds all the metadata needed for your job. For instance, variables... And as for the job template, consider it as being the file describing your use case needs. In other words, a job template relies on a plan for scheduling information (intervals, etc...), sink metadata (cluster name, ports, etc...). Achieving a periodic job execution is thus made available with the help of templating.
Note
Jobs and Plans are compatible with both JSON and HJSON
Note
In a production environment, a plan can keep track of it's last successful event and resume from it in case of failure. Use --last-committed
to activate this option. This feature is only available when the execution of your plan is made within a channel.
Note
In a production environment, you can launch a plan between two specifics data with the --date
parameter. Each date in string format should be delimited by ,
as separator. Date format has to be: yyyy-MM-dd'T'HH:mmZ
. This feature is only available when the execution of your plan is made within a channel.
Let's use some file examples to be clearer:
plan.hjson:
With the help of templating, every minutes, a job template is generated with the appropriate metadata and then launch as a PML job.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | { model:{ cluster_name: es_search index_input: platform-metricbeat-* nodes: ["localhost"] dates: { day: { offset: -PT1m format: yyyy.MM.dd } from: { offset: -PT1m format: yyyy-MM-dd'T'HH:mmZ } timezone: { format: +00:00 } to: { format: yyyy-MM-dd'T'HH:mmZ } } } plan_settings: { cron: "*/1 * * * *" delay: PT10s } } |
job-template.hjson:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | { job: [ { component: input publish: [ { stream: data } ] settings: { cluster_name: "{{cluster_name}}" source_column: source index: "{{index_input}}-{{day}}" nodes: {{ nodes | json_encode() }} query: { range: { @timestamp: { gte: "{{from}}" lt: "{{to}}" time_zone: "{{timezone}}" } } } } type: elastic_batch_input } { component: show settings: { } subscribe: [ { component: input stream: data } ] type: show } ] } |
How to launch a Plan¶
Plans default behaviour¶
In case a plan is executed in --last-committed
mode and no persistence key is found in your plan configuration file:
- default index name for cursor: platform-plan-cursor
- default hostname for elasticsearch: localhost
- default port for elasticsearch: 9200
For model
key that is used to define templating variables, the following ones are used by our internal system and will be overriden if used:
- name: defined in your plan configuration
- tenant: defined in your plan configuration
- channel: defined in your plan configuration
- version: defined in your plan configuration
Using a file path¶
To execute a plan, you can run these command from your terminal:
1 2 3 4 5 6 7 8 | # Default --runtime-environment is set to spark # others: pyspark_5x (craig) and pyspark (dave) # foreground/local $ punchlinectl --plan <plan_path> --template <template_path> # one a cluster $ punchlinectl --plan <plan_path> --template <template_path> --deploy-mode cluster --spark-master <master_url> |