Plans are punch applications that provides you with periodic scheduling of punchlines. It is similar to a regular cron-like tool, and enables you to trigger PunchLines on a regular intervals. In contrast to crons, it provides you with:
- resiliency: you can restart a plan from another server, it will properly resume processing from a last saved cursor.
- logging: state of your plan application, uptime, etc...
- configuration: a plan is easily defined as (yet another) hjson or json file.
- templating: every punchline executed by a plan is (typically) generated first from a template in a way to make it act on the right input datasets.
Most often templating is used to generate dates. Why dates ? because the essential parameters for retrieving input data is based on time.
The following diagram illustrates the way it works. Your punchline configuration file is actually a template file. The plan will be in charge of periodically defining date values, then generating actual job files from the templates, which in turn is scheduled for execution.
You want to launch a punchline every minutes to read data from an external source (i.e kafka) and process it by enriching it's content with meaningful information and later store the result on file system.
You want to launch a punchline after some hours; which contains a node (i.e elasticsearch) that has to query it's data for a specific time range; The time range has to be calculate from the current time. Plans enable you to do so with the help of templating features !
Let's say this time our need is similar to case 1 but with the addition that the execution of the punchline is critical; in other words it should respect to a SLA; (at least once processing...)