Skip to content

Plans

Concepts

Plans is a library which was developped due to the lack of features of existing cron-like tools. In short, similar to your regular cron tool, it enable you to trigger PunchLine jobs on a regular interval... with the addition of features like:

  • resiliency
  • logging: state of your plan application, uptime, etc...
  • configuration base: just define one of the available settings in a json-like format and it just works !
  • templating: in certain case, you want to update variables with an advancing watermark, for instance incrementing a date which should be applied to a PML job for fectching data...

Note

These features were built on top of elasticsearch

Once you have a valid PML job configuration, your next issue is to schedule it for automatic execution. The PunchPlatform enable you to define periodic scheduling of your jobs by using configuration files called plans.

To explain it straight, a plan is about generating dates, then scheduling your job using these dates. Why dates ? because the essential parameters for retrieving input data is based on time.

The following diagram illustrates the way it works. Your job configuration file is actually a template file. The plan will be in charge of periodically defining date values, then generating actual job files from the templates, which in turn is scheduled for execution.

image

Quick Overview

Plan Configuration structure

Below is an example on how a configuration.plan should look like:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
{
  model:{
    template_var1: helloworld
    dates: {
       template_var2: {
         offset: -PT1m
         format: yyyy.MM.dd
       }
    }
  }
  plan_settings: {
    # start not required
    start: 2019-11-14T13:00:18.940Z
    # stop not required
    stop: 2059-11-14T13:00:18.940Z
    # delay not required 
    delay: PT10s
    # cron required
    cron: "*/1 * * * *"
    # required with --last-committed
    # default to information based on your punchplatform.properties
    persistence: {
        type: elasticsearch
        index_name: platform-plan-cursor
        credentials: {
            es_search: {
              token: mytoken
              token_type: ApiKey
            }
        }
    }
  }
  # not required
  # generates logging and metrics information in an index...
  metrics: {
    reporters: [
      {
        type: elasticsearch
        cluster_name: es_search
        index_name: example_plan_metrics
      }
    ]
  }
}
  • model: key-value where you will be setting variables to be substitude in your job.pml configuration file.
  • model.dates: reserved dictionnary containing your dates to be substitude in your job.pml configuration file. > each date is itself a dictionnary containing two key-value
1
2
3
4
my_date: {
    format: yyyy-MM-dd'T'HH:mmZ
    offset: -PT2m
}
  • plan_settings: configure on how your plan will start and how checkpointing will be persisted, available settings are: > start, stop, delay, cron and persistence (a dictionnary)

More on Dates

In this section we explain how dates should be defined. You define two-fields configuration elements:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
{
    # Mandatory : ISO 8601 format used for generating your date value
    format: yyyy-MM-dd'T'HH:mmZ

    # Optional : you can define an optional offset relative to the execution data.
    # This will be used to shift the date.
    #
    # Here we generate a date with a two minutes shift from the cron date.
    offset: -PT2m
}

Both the offset and format accepted values that follows the ISO 8601 standard.

Execution Date offset format Result
2017-12-05T10:30:00.000+01:00 -PT2m yyyy-MM-dd’T’HH:mmZ 2017-12-05T10
2017-12-05T10:30:00.000+01:00 +01:00 +01:00
2017-12-05T10:30:00.000+01:00 yyyy.MM.dd 2017.12.05

Note

refer to this link: TimeZones for available timezones...

Plan Configuration

Plan configuration files contain all the information required to iteratively generated jobs with the right date time ranges.

plan_settings

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
{
    ...
    plan_settings: {
        // section we are going to describe
        ...
        persistence: {
            // subsection we are going to describe
        }
    }
}
  • cron : String

    Mandatory: Cron expression defining the scheduling

  • start: String

    Now Start date at format "yyyy-MM- dd'T'HH:mmZ"

  • stop : String

    Stop date at format "yyyy-MM- dd'T'HH:mmZ"

  • delay : String

    Offset between execution date and job launching

  • persistence: Dictionnary

    Where checkpoint information will be saved

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
plan_settings: {
    ...
    # credentials are not required
    persistence: {
        type: <String> -> elasticsearch
        index_name: <String> -> index name prefix
        credentials: <Dictionnary> {
            <es_cluster_id>: {
                # either
                token: <String> your_token
                token_type: <String> -> ApiKey
                # or
                user: your_user
                password: your_password
            }
        }
    }
}

model

1
2
3
4
5
6
7
8
9
{
    ...
    model: {
        // section we are going to describe
        dates: {
            // subsection we are going to describe
        }
    }
}
  • dates : Dictionnary of dates

    each key referencing to a date is the templated name which should be used in your job.template configuration file

1
2
3
4
5
6
7
8
9
...
model: {
    dates: {
        mydate1: {
            format: <String> -> a valid datetime format (ISO 8601)
            offset: <String> -> date shift (-PT2m or +PT2m)
        }
    }
}
  • model.<anything>: Key-Value

    any key-value pairs described at root level of model key will be added to our templating context. Hence, these key-value pairs will be used as substitution in our job.template configuration file.

1
2
3
4
5
6
7
8
9
model: {
    index_name: myindex
    port: 9200
    ...
    template10: template10_value
    dates: {
        ...
    }
}

Example based on the model section

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# given the job.template below and the plan configuration above
{
    jobs: [
        {
            type: elastic_batch_input
            settings: {
                port: {{ port }}
                index: {{ index_name }}
                query: {
                    query: {
                        bool: {
                            must: [
                                {
                                    range: {
                                        @timestamp: {
                                            gte: {{ mydate1 }}
                                            lt: {{ mydate2 }}
                                        }
                                    }
                                }
                            ]
                        }
                    }
                }
            }
            ...
        }
        ...
    ]
}

Note

The library used for templating is Jtwig a template engine similar to Jinja2 and its Json extension.

Plan Execution

To learn how to use PunchPlatform Plan in practice, please refer to the dedicated Plans Executions operations guide.