Job Configuration
This page is designed as a quick reference for setting up your PML job with customs settings. By doing so, you will be able to tune your PML job to get some decent computation speed and metrics...
Example¶
Let's begin with a short PML example like the one below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 | { runtime_id: my-job-id tenant: job_tenant job: [ { description: ''' read all metricbeat documents from local elasticsearch and gennerate a Dataset<Row> out of it ''' type: elastic_batch_input component: input settings: { index: punch-academy-example cluster_name: es_search nodes: [ localhost ] elastic_settings: { es.index.read.missing.as.empty: yes } id_column: id source_column: source output_columns: [ { type: string field: "address.street" } { type: integer field: "age" } ] } publish: [ { stream: default } ] } { type: show component: show subscribe: [ { stream: default component: input } ] } ] spark_settings: { spark.executor.memory: 1g spark.executor.cores: "2" spark.executor.instances: "2" } metrics: { reporters: [ { type: elasticsearch hosts: [ { host: localhost port: 9200 } ] } { type: console } ] } } |
Settings¶
runtime_id
(Optional) Define a unique id or let PML define one for you.
tenant
(Optional) Define a tenant name or let PML define one for you.
job
Your PML is in this section
spark_settings
(Optional) Please refer to: Spark configuration for a full list of available spark settings.
Custom settings works with the following modes:
- client
- cluster
- foreground (similar to local)
metrics
(Optional) Define which reporter will consume the execution metrics.
Two types of reporters are available:
-
console
: Display the metrics in the terminal -
elasticsearch
: Sent the metrics to elasticsearch