Skip to content

Executing Plans

This section explains how to launch analytics plans in practice. For that a list of useful commands to keep in mind will be describe below.

It is important to have basic understanding of Spark concepts. If it's not the case, please go through the Plan concepts section first.

File Configuration

In addition to jobs, the PunchPlatform analytics client lets you also execute Plans.

In brief, a Plan provides a way of executing jobs periodically with the help of templating.

Use case Fetching data from an elasticsearch cluster every 10 minutes.

To accomplish this, we will make use of the two files below: - a plan - a job template.

Definition You can think of the plan as a file that will holds all the metadata needed for your job. For instance, variables... And as for the job template, consider it as being the file describing your use case needs. In other words, a job template relies on a plan for scheduling information (intervals, etc...), sink metadata (cluster name, ports, etc...). Achieving a periodic job execution is thus made available with the help of templating.

Note

Jobs and Plans are compatible with both JSON and HJSON

Note

In a production environment, a plan can keep track of it's last successful event and resume from it in case of failure. Use --last-committed to activate this option. This feature is only available when the execution of your plan is made within a channel.

Note

In a production environment, you can launch a plan between two specifics data with the --date parameter. Each date in string format should be delimited by ,as seperator. Date format has to be: yyyy-MM-dd'T'HH:mmZ. This feature is only available when the execution of your plan is made within a channel.

Let's use some file examples to be clearer:

plan.hjson:

With the help of templating, every minites, a job template is generated with the appropriate metadata and then launch as a PML job.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
{
  configurations:{
    cluster_name: es_search
    index_input: platform-metricbeat-*
    nodes: ["localhost"]
  }
  cron: "*/1 * * * *"
  dates: {
    day: {
      duration: -PT1m
      format: yyyy.MM.dd
    }
    from: {
      duration: -PT1m
      format: yyyy-MM-dd'T'HH:mmZ
    }
    timezone: {
      format: +00:00
    }
    to: {
      format: yyyy-MM-dd'T'HH:mmZ
    }
  }
  delay: PT10s
}

job-template.hjson:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
{
  job: [
    {
      component: input
      publish: [
        {
          stream: data
        }
      ]
      settings: {
        cluster_name: "{{cluster_name}}"
        source_column: source
        index: "{{index_input}}-{{day}}"
        nodes: {{ nodes | json_encode() }}
        query: {
          range: {
            @timestamp: {
              gte: "{{from}}"
              lt: "{{to}}"
              time_zone: "{{timezone}}"
            }
          }
        }
      }
      type: elastic_batch_input
    }
    {
      component: show
      settings: {
      }
      subscribe: [
        {
          component: input
          stream: data
        }
      ]
      type: show
    }
  ]
}

How to launch a Plan

Using a file path

To execute a plan, you can run these command from your terminal:

1
2
3
4
5
# foreground/local
$ punchplatform-analytics.sh --plan <plan_path> --template <template_path>

# one a cluster
$ punchplatform-analytics.sh --plan <plan_path> --template <template_path> --deploy-mode cluster --spark-master <master_url>