Plans¶
Plans are special Punch applications that periodically execute punchlines. Refer to the plan concept overview chapter to have a quick understanding.
Batch punchlines are commonly used to perform aggregations, extract data, train model... These tasks must be scheduled periodically.
Plans
make it possible to schedule a batch processing punchline.
As a batch processing pipeline usually runs on a given time range,
Plans update your time range every time it runs.
In short, this is what a Plan does : "every hour execute that batch punchline over the last 4 hours of data".
Configuration¶
In this quick tour we will have a look at the simplest possible plan.
cd $PUNCHPLATFORM_CONF_DIR/samples/plans/basic
There you have a plan.yaml
and the punchline.template
. Have a look at their content. Here is the plan:
---
version: "6.0"
model:
dates: # Date generator
from:
offset: -PT1m
format: yyyy-MM-dd'T'HH:mmZ
to:
format: yyyy-MM-dd'T'HH:mmZ
settings:
cron: "*/1 * * * *"
A plan is in charge on generating dates. Dates are then consumed by a Punchline.
The PT1m
is a standard way to express "now minus one minute".
As for the cron expression it means : "run the punchline every minute".
Here is the punchline template file content:
---
type: punchline
runtime: spark
version: "6.0"
dag:
# Will generate dates at each run
- type: dataset_generator
component: input
settings:
input_data:
- date: "{{ from }}"
name: from_date
- date: "{{ to }}"
name: to_date
publish:
- stream: data
# Show output
- type: show
component: show
settings:
truncate: false
subscribe:
- component: input
stream: data
That is a simple punchline that prints to stdout the input node generated columns. These columns contain the dates generated by the plan.
Try it¶
To start this plan here is the command line:
planctl --tenant mytenant start --plan plan.yaml --template template.yaml
You will see the result of the Spark punchline printed every minute. Check in particular the dataset column, there you have your dates. Every punchline property can be templatized using dates or other values.
Important
Punch plans benefit from many production grade features such as persitent cursors, high-availability and monitoring. These are described in the Reference Guide chapter.
Plan Alternatives ?¶
Plans are both simple and powerful. A number of well-known technologies provide the same sort of resilient scheduling power. For example Apache AirFlow, or Kubernetes Argo Workflow. These technologies are more complex and heavyweight. Punch plans are tiny, small medium or large-scale platforms.