Configuration¶
Spark Punchline Schema¶
Spark punchlines structure.
{
version: "6.0"
type: punchline
tenant: mytenant
channel: mychannel
runtime: spark
runtime_id: my-job-id
dag: [
...
]
metrics: {
...
}
settings: {
}
}
metrics¶
This section is used to report the monitoring metrics. Reported metrics can either be of type event or error log. Published information can be helpful for having a better insight of your executed punchline.
Settings¶
Spark Settings¶
The settings part lets you define punchline level settings. These can contain any of the standard spark settings. refer to the Spark configuration for a complete list.
Punch Settings¶
spark.additional.jar
and spark.additional.pex
for including custom dependencies at runtime.
spark.pre_punchline_execution
and spark.post_punchline_execution
can be used in pyspark
runtime for executing arbitrary code before your spark application starts.
spark.worker.childopts
can be used to override default JVM memory setting.
Example
Here is an example:
{
version: "6.0"
runtime: spark
type: punchline
tenant: job_tenant
dag:
[
{
type: elastic_input
component: input
settings:
{
index: punch-academy-example
elastic_settings:
{
es.index.read.missing.as.empty: yes
}
id_column: id
column: source
output_columns:
[
{
type: string
field: address.street
}
{
type: integer
field: age
}
]
}
publish:
[
{
stream: default
}
]
}
{
type: show
component: show
subscribe:
[
{
stream: default
component: input
}
]
}
]
settings:
{
spark.executor.memory: 1g
spark.executor.cores: "2"
spark.executor.instances: "2"
spark.worker.childopts: -Xmx1g -Xms256m
}
metrics:
{
reporters:
[
{
type: elasticsearch
index_name: mytenant-metrics
}
{
type: console
}
]
}
}