Skip to content

Executing Jobs

This section explains how to launch Spark jobs in practice and which are the useful commands to keep in mind.

It is important to have a minimal understanding of Spark concepts. If you haven't already, please go through the Jobs concepts section first.

The basics: foreground mode

The quickest and easiest way to run a Spark job is to call it using the foreground mode. This mode is the one used by default. To run it, use these commands:

# These 2 are stricly equivalent
$ punchlinectl start -p <punchline_path>
$ punchlinectl start -p <punchline_path> --deploy-mode foreground

In this mode, everything is executed as part of the same process. This mode is useful should you develop PML stages and nodes. You can then easily debug your PML applications using your favorite IDE.

When using that mode, no information will be displayed on Spark UI

The command outputs look like the following one, everything will show up directly in your terminal.

$ punchlinectl start -p $PUNCHPLATFORM_CONF_DIR/samples/punchlines/spark/basics/punch_node.hjson


| address      | age | friends       | name  | base64   | decade |

| [clemenceau] | 21  | WrappedArr... | phil  | cGhpbA== | 1      |
| [clemenceau] | 23  | WrappedArr... | alice | YWxpY2U= | 3      |
| [clemenceau] | 53  | WrappedArr... | dimi  | ZGltaQ== | 3      |

root
 |-- address: struct (nullable = true)
 |    |-- street: string (nullable = true)
 |-- age: long (nullable = true)
 |-- friends: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- name: string (nullable = true)
 |-- base64: string (nullable = true)
 |-- decade: integer (nullable = true)

[
  {
    "name": "punch_data",
    "title": "SHOW"
  }
]

Local Mode

In this mode, the client fully embeds both the driver and the executors together in an single JVM. Again, the client stays alive during the job lifetime.

$ punchlinectl start -p <punchline_path> --deploy-mode client --spark-master local[*] 

When using that mode, no information will be displayed on Spark UI

Launch it on a cluster (cluster)

In this mode, the job is submitted to the spark cluster. The client only submit the job to the Spark master and returns. The job is distributed to the various spark workers. This is the production setup.

To submit a job in cluster mode, execute the following command:

$ punchlinectl start -p <punchline_path> --deploy-mode cluster --spark-master <master_url>

Launch it on a cluster (client)

In this mode, the client start the driver internally, i.e. without spawning a child process. In turn the driver requests for executors to the master.

The client stays alive during the whole job lifetime, and collects metrics and logs that are redirected to its standard console.

To submit a job in client mode, execute the following command:

$ punchlinectl start -p <punchline_path> --deploy-mode client --spark-master <master_url>