Skip to content

Plans

Plans are special Punch applications that periodically execute punchlines. Refer to the plan concept overview chapter to have a quick understanding.

Batch punchlines are commonly used to perform aggregations, extract data, train model... These tasks must be scheduled periodically.

Plans make it possible to schedule a batch processing punchline. As a batch processing pipeline usually runs on a given time range, Plans update your time range every time it runs.

In short, this is what a Plan does : "every hour execute that batch punchline over the last 4 hours of data".

Configuration

In this quick tour we will have a look at the simplest possible plan.

cd $PUNCHPLATFORM_CONF_DIR/samples/plans/basic

There you have a plan.yaml and the punchline.template. Have a look at their content. Here is the plan:

---
version: "6.0"
model:
  dates: # Date generator 
    from:
      offset: -PT1m 
      format: yyyy-MM-dd'T'HH:mmZ
    to:
      format: yyyy-MM-dd'T'HH:mmZ 
settings:
  cron: "*/1 * * * *"

A plan is in charge on generating dates. Dates are then consumed by a Punchline.
The PT1m is a standard way to express "now minus one minute". As for the cron expression it means : "run the punchline every minute".

Here is the punchline template file content:

---
type: punchline
runtime: spark
version: "6.0"
dag:
  # Will generate dates at each run
  - type: dataset_generator
    component: input
    settings:
      input_data:
        - date: "{{ from }}"
          name: from_date
        - date: "{{ to }}"
          name: to_date
    publish:
      - stream: data

  # Show output
  - type: show
    component: show
    settings:
      truncate: false
    subscribe:
      - component: input
        stream: data

That is a simple punchline that prints to stdout the input node generated columns. These columns contain the dates generated by the plan.

Try it

To start this plan here is the command line:

planctl --tenant mytenant start --plan plan.yaml --template template.yaml

You will see the result of the Spark punchline printed every minute. Check in particular the dataset column, there you have your dates. Every punchline property can be templatized using dates or other values.

Important

Punch plans benefit from many production grade features such as persitent cursors, high-availability and monitoring. These are described in the Reference Guide chapter.

Plan Alternatives ?

Plans are both simple and powerful. A number of well-known technologies provide the same sort of resilient scheduling power. For example Apache AirFlow, or Kubernetes Argo Workflow. These technologies are more complex and heavyweight. Punch plans are tiny, small medium or large-scale platforms.

Resources