Skip to content

Job Configuration

This page is designed as a quick reference for setting up your PML job with customs settings. By doing so, you will be able to tune your PML job to get some decent computation speed and metrics...

Example

Let's begin with a short PML example like the one below:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
{
    runtime_id: my-job-id
    tenant: job_tenant
    job:
    [
        {
            description:
            '''
            read all metricbeat documents from local elasticsearch
            and gennerate a Dataset<Row> out of it
            '''
            type: elastic_batch_input
            component: input
            settings: {
                index: punch-academy-example
                cluster_name: es_search
                nodes: [ 
                    localhost 
                ]
                elastic_settings: {
                    es.index.read.missing.as.empty: yes
                }
                id_column: id
                source_column: source
                output_columns: [
                    {
                        type: string
                        field: "address.street"
                    }
                    {
                        type: integer
                        field: "age"
                    }
                ]
            }
            publish: [ 
                { 
                    stream: default 
                } 
            ]
        }
        {
            type: show
            component: show
            subscribe: [
                {
                    stream: default
                    component: input
                }
            ]
        }          
    ]
    spark_settings:
    {
        spark.executor.memory: 1g
        spark.executor.cores: "2"
        spark.executor.instances: "2"
    }

    metrics:
      {
        reporters:
        [
          {
            type: elasticsearch
            hosts:
            [
              {
                host: localhost
                port: 9200
              }
            ]
          }
          {
            type: console
          }
        ]
      }

}

Settings

runtime_id
(Optional) Define a unique id or let PML define one for you.

tenant
(Optional) Define a tenant name or let PML define one for you.

job
Your PML is in this section

spark_settings
(Optional) Please refer to: Spark configuration for a full list of available spark settings.

Custom settings works with the following modes:

  • client
  • cluster
  • foreground (similar to local)

metrics
(Optional) Define which reporter will consume the execution metrics.
Two types of reporters are available:

  • console: Display the metrics in the terminal

  • elasticsearch: Sent the metrics to elasticsearch