Skip to content

Plans

Concepts

Plans is a library which was developped due to the lack of features of existing cron-like tools. In short, similar to your regular cron tool, it enable you to trigger PunchLine jobs on a regular interval... with the addition of features like:

  • resiliency
  • logging: state of your plan application, uptime, etc...
  • configuration base: just define one of the available settings in a json-like format and it just works !
  • templating: in certain case, you want to update variables with an advancing watermark, for instance incrementing a date which should be applied to a PML job for fectching data...

Note

These features were built on top of elasticsearch

Once you have a valid PunchLine job configuration, your next issue is to schedule it for automatic execution. The PunchPlatform enable you to define periodic scheduling of your jobs by using configuration files called plans.

To explain it straight, a plan is about generating dates, then scheduling your job using these dates. Why dates ? because the essential parameters for retrieving input data is based on time.

The following diagram illustrates the way it works. Your job configuration file is actually a template file. The plan will be in charge of periodically defining date values, then generating actual job files from the templates, which in turn is scheduled for execution.

image

Quick Overview

Plan Configuration structure

Below is an example on how a configuration.plan should look like:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
{
  tenant: mytenant
  channel: atestchannel
  version: "5.0"
  name: configuration.plan
  model:{
    template_var1: helloworld
    dates: {
       template_var2: {
         offset: -PT1m
         format: yyyy.MM.dd
       }
    }
  }
  plan_settings: {
    # start not required
    start: 2019-11-14T13:00:18.940Z
    # stop not required
    stop: 2059-11-14T13:00:18.940Z
    # delay not required 
    delay: PT10s
    # cron required
    cron: "*/1 * * * *"
    # required with --last-committed
    # default to information based on your punchplatform.properties
    persistence: [
        {
            type: elasticsearch
            index_name: platform-plan-cursor
            es_cluster: es_search
            # Nodes parameter takes priority over es_cluster
            nodes: [
                {
                    host: localhost
                    port: 9200
                }
            ]
            ssl: false
            credentials: {
                token: mytoken
                token_type: ApiKey
            }
        }
    ]
  }
  # not required
  # generates logging and metrics information in an index...
  metrics: {
    reporters: [
      {
        type: elasticsearch
        cluster_name: es_search
        index_name: example_plan_metrics
      }
    ]
  }
}
  • tenant: (mandatory) name of the tenant that will be executing the plan
  • version: (mandatory) version of your Platform
  • channel: (mandatory) name of the channel in which the plan will be executed
  • name: (mandatory) a short name specified in your channel structure
  • model: key-value where you will be setting variables to be substitude in your job.pml configuration file.
  • model.dates: reserved dictionnary containing your dates to be substitude in your job.pml configuration file. > each date is itself a dictionnary containing two key-value
1
2
3
4
my_date: {
    format: yyyy-MM-dd'T'HH:mmZ
    offset: -PT2m
}
  • plan_settings: configure on how your plan will start and how checkpointing will be persisted, available settings are: > start, stop, delay, cron and persistence (a dictionnary)

More on Dates

In this section we explain how dates should be defined. You define two-fields configuration elements:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
{
    # Mandatory : ISO 8601 format used for generating your date value
    format: yyyy-MM-dd'T'HH:mmZ

    # Optional : you can define an optional offset relative to the execution data.
    # This will be used to shift the date.
    #
    # Here we generate a date with a two minutes shift from the cron date.
    offset: -PT2m
}

Both the offset and format accepted values that follows the ISO 8601 standard.

Execution Date offset format Result
2017-12-05T10:30:00.000+01:00 -PT2m yyyy-MM-dd’T’HH:mmZ 2017-12-05T10
2017-12-05T10:30:00.000+01:00 +01:00 +01:00
2017-12-05T10:30:00.000+01:00 yyyy.MM.dd 2017.12.05

Note

refer to this link: TimeZones for available timezones...

Plan Configuration

Plan configuration files contain all the information required to iteratively generated jobs with the right date time ranges.

plan_settings

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
{
    ...
    plan_settings: {
        // section we are going to describe
        ...
        persistence: [
            // subsection we are going to describe
            {...}
        ]
    }
}
  • cron : String

    Mandatory: Cron expression defining the scheduling

  • start: String

    Now Start date at format "yyyy-MM- dd'T'HH:mmZ"

  • stop : String

    Stop date at format "yyyy-MM- dd'T'HH:mmZ"

  • delay : String

    Offset between execution date and job launching

  • persistence: List of dictionnary

    Where checkpoint information will be saved

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
plan_settings: {
    ...
    # credentials are not required
    persistence: [
        {
            type: <String> -> elasticsearch
            index_name: <String> -> index name prefix
            ssl: <Boolean> -> true or false
            nodes: <List of dictionnaries> [
                {
                    // values defines in nodes takes priority over es_cluster
                    host: <String> -> localhost
                    port: <int> -> 9200
                }
            ]
            es_cluster: <String> -> name of your elasticsearch cluster located in your punchplatform.properties 
            credentials: <Dictionnary> {
                # either
                token: <String> your_token
                token_type: <String> -> ApiKey
                # or
                user: your_user
                password: your_password   
            }
        }
    ]
}

model

1
2
3
4
5
6
7
8
9
{
    ...
    model: {
        // section we are going to describe
        dates: {
            // subsection we are going to describe
        }
    }
}
  • dates : Dictionnary of dates

    each key referencing to a date is the templated name which should be used in your job.template configuration file

1
2
3
4
5
6
7
8
9
...
model: {
    dates: {
        mydate1: {
            format: <String> -> a valid datetime format (ISO 8601)
            offset: <String> -> date shift (-PT2m or +PT2m)
        }
    }
}
  • model.<anything>: Key-Value

    any key-value pairs described at root level of model key will be added to our templating context. Hence, these key-value pairs will be used as substitution in our job.template configuration file.

1
2
3
4
5
6
7
8
9
model: {
    index_name: myindex
    port: 9200
    ...
    template10: template10_value
    dates: {
        ...
    }
}

Example based on the model section

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# given the job.template below and the plan configuration above
{
    jobs: [
        {
            type: elastic_batch_input
            settings: {
                port: {{ port }}
                index: {{ index_name }}
                query: {
                    query: {
                        bool: {
                            must: [
                                {
                                    range: {
                                        @timestamp: {
                                            gte: {{ mydate1 }}
                                            lt: {{ mydate2 }}
                                        }
                                    }
                                }
                            ]
                        }
                    }
                }
            }
            ...
        }
        ...
    ]
}

Note

The library used for templating is Jtwig a template engine similar to Jinja2 and its Json extension.

How to launch/manage a Plan

Launch a plan

To execute a plan, you can run these command from your terminal:

1
2
3
4
5
6
7
8
# Default --runtime-environment is set to spark
# others: pyspark_5x (craig) and pyspark (dave)

# foreground/local
$ punchlinectl --plan <plan_path> --template <template_path>

# one a cluster
$ punchlinectl --plan <plan_path> --template <template_path> --deploy-mode cluster --spark-master <master_url>

Manage a plan cursor

A cli is provided to manage plan cursors, namely: punchline-plan-manager.

Prerequisites

plan-id generation rule

As you may already know, plan --last-committed is only supported when used with shiva.

1
2
3
4
5
6
7
8
9
# a composition of tenant, channel name and plan name
$TENANT="mytenant"
$CHANNEL="aggregation"
$PLAN_NAME="myplan"

PLAN_ID="$TENANT_$CHANNEL_$PLAN_NAME"

# echo $PLAN_ID
mytenant_aggregation_myplan

es-cluster

Es cluster is taken from punchplatform.properties

Your es nodes that are set in your es_cluster must all be discoverable by the machine where you will be using punchline-plan-manager.

commands

With this command, you can:

  • view the cursor meta data that the plan will be using to restart from
  • change the cursor that a plan will be using to restart from
1
2
3
4
5
# to view cursor meta data of a specific plan id
punchline-plan-manager --list-cursor --plan-id myplanid --tenant mytenant --es-cluster es_search --index-name-pattern  platform-plan-cursor

# to change the cursor that a plan will be using to restart from
punchline-plan-manager --reset-cursor --plan-id myplanid --tenant mytenant --restart-date 2020-01-01T12:12:29.771Z --es-cluster es_search --index-name-pattern platform-plan-cursor