Skip to content

Metrics in Apache Spark

Description

When properly configured, any Spark job or plan will publish its metrics. Refer to the metrics reporters documentation section to see how to set it up. If the plan or job has no tenant or channel, these values are set to "default".

Metrics format and content

Here is an example of an Apache Spark metric "stage completed".

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
{
  "@timestamp": "2018-04-25T15:09:38.918Z",
  "type": "spark",
  "name": "spark.stage.completed",
  "job.runtime.id": "0ef774b0-1ea5-4008-ab75-836c27ec79b8",
  "job.deploy.mode": "client",
  "spark.event.timestamp": "2018-04-25T15:09:38.781Z",
  "platform.id": "my-platform",
  "platform.tenant": "mytenant",
  "platform.channel": "aggregation",
  "spark.stage.failure_reason": "None",
  "spark.stage.id": "2",
  "spark.stage.num_tasks": "1",
  "spark.stage.num_status": "succeeded"
}

Common fields

All the fields mentionned in the list folowing list are present in all the Spark metrics events (jobs and plans):

  • @timestamp (date)

    Timestamp (ISO format) of the document received by the reporter (not the actual event one)

  • type (string)

    The technology name, always set to "spark" in this case.

  • name (string)

    The event name.

  • job.runtime.id (string)

    A job unique ID, especially useful to identify all the events coming from the same application.

  • job.deploy.mode (string)

    The Spark application mode used to launch the job.

    values: "foreground", "client", "cluster"

  • platform.id (string)

    The platform ID from where the job has been launched.

  • platform.tenant (string)

    The tenant name from where the job has been launched.

  • platform.channel (string)

    The channel name from where the job has been launched.

The next field is dedicated to Spark jobs (plans currently do not have any "event" concept):

  • spark.event.timestamp (date)

    Actual timestamp (ISO format) of the original metric event. This value is extracted from the Spark event object itself.

Jobs metrics

Any reported event is generated by Spark itself. In fact, we internally rely on a SparkListener java class that implement all method listed in the official documentation.

To get more information, see the Apache Spark documentation.

Here is the list of events type reported:

Name Description
spark.application.start Called when the application starts
spark.job.start Called when a job starts
spark.stage.submitted Called when a stage is submitted
spark.task.start Called when a task starts
spark.task.result Called when a task begins remotely fetching its result (if needed)
spark.task.end Called when a task ends
spark.stage.completed Called when a stage completes successfully or fails
spark.job.end Called when a job ends
spark.application.end Called when the application ends

Plans metrics

Currently, the only metric reported by a Spark plan is its uptime. Here is a sample of the spark.plan.uptime metric document:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
{
  "@timestamp": "2019-06-24T12:04:54.089Z",
  "spark.plan.start.date": "2019-06-24T12:04:21.557Z",
  "spark.plan.uptime.count": 29,
  "name": "spark.plan.uptime",
  "job.runtime.id": "353ba380-632b-4482-81b1-23a9163c4870",
  "type": "spark",
  "plan.runtime.id": "mytenant_aggregation_/Users/jonathan/Desktop/punchplatform-standalone-5.4.0-SNAPSHOT/conf/tenants/mytenant/channels/aggregation/plan.hjson",
  "platform.spark.job": "/Users/jonathan/Desktop/punchplatform-standalone-5.4.0-SNAPSHOT/conf/tenants/mytenant/channels/aggregation/plan.hjson",
  "platform.channel": "aggregation",
  "platform.id": "punchplatform-primary",
  "platform.tenant": "mytenant"
}