Skip to content

Configuration

Spark Punchline Schema

Spark punchlines structure.

{
    version: "6.0"
    type: punchline
    tenant: mytenant
    channel: mychannel
    runtime: spark
    runtime_id: my-job-id
    dag: [
        ...
    ]
    metrics: {
        ...
    }
    settings: {
    }
}

metrics

This section is used to report the monitoring metrics. Reported metrics can either be of type event or error log. Published information can be helpful for having a better insight of your executed punchline.

Settings

Spark Settings

The settings part lets you define punchline level settings. These can contain any of the standard spark settings. refer to the Spark configuration for a complete list.

Punch Settings

spark.additional.jar and spark.additional.pex for including custom dependencies at runtime.

spark.pre_punchline_execution and spark.post_punchline_execution can be used in pyspark runtime for executing arbitrary code before your spark application starts.

spark.worker.childopts can be used to override default JVM memory setting.

Example

Here is an example:

{
  version: "6.0"
  runtime: spark
  type: punchline
  tenant: job_tenant
  dag:
  [
    {
      type: elastic_batch_input
      component: input
      settings:
      {
        index: punch-academy-example
        elastic_settings:
        {
          es.index.read.missing.as.empty: yes
        }
        id_column: id
        source_column: source
        output_columns:
        [
          {
            type: string
            field: address.street
          }
          {
            type: integer
            field: age
          }
        ]
      }
      publish:
      [
        {
          stream: default
        }
      ]
    }
    {
      type: show
      component: show
      subscribe:
      [
        {
          stream: default
          component: input
        }
      ]
    }
  ]
  settings:
  {
    spark.executor.memory: 1g
    spark.executor.cores: "2"
    spark.executor.instances: "2"
    spark.worker.childopts: -Xmx1g -Xms256m
  }
  metrics:
  {
    reporters:
    [
      {
        type: elasticsearch
        index_name: mytenant-metrics
      }
      {
        type: console
      }
    ]
  }
}