Skip to content

Parallel processing with Spark

This chapter is will talk about the necessary prerequisite to launch a punchline application on a spark cluster...

Requirements

Let's consider a group of hosts (h1, h2, h3 and A), where A is an arbitrary host between (h1, h2 and h3).

  • h1: shiva master
  • h2: shiva worker
  • h3: shiva worker

Deployment setup example

........... Spark Shiva
Master h1 h1
Worker h2, h3 h2, h3

Some definitions

Now let's consider R1 and R2:

  • R1: executed punchline does not need to be colocalized with shiva

    this means that for the punchline to execute properly in a distributed spark cluster, it is not mandatory to have shiva worker deployed on each spark node

  • R2: executed punchline should be colocalized with shiva

    each spark nodes within the spark cluster should be colocalized with a running shiva worker

Execution compatibility

The table below is a representation of what kind of deployment setup is required before being able to execute a spark application in a distributed fashion.

  • args_1: --deploy-mode client --spark-master local[*]
  • args_2: --deploy-mode client --spark-master spark://${SPARK_MASTER}
  • args_3: --deploy-mode cluster --spark-master spark://${SPARK_MASTER}
Client Local Client deployed Cluster
shiva A h1, h2, h3 h1, h2, h3
spark A h2, h3 h1, h2, h3
===================== ============ =============== ===========
can be executed R1 R1 R2
args args_1 args_2 args_3

Quick overview

  • in client mode, with spark master either local or ${SPARK_MASTER} of your spark cluster, shiva is not mandatory for punchlines to be executed properly
  • in cluster mode, spark master can only point to ${SPARK_MASTER} of your spark cluster, shiva is required to be colocalized on each spark nodes for the punchline to be executed properly