Example

This chapter shows differences between regular spark job creation and PunchPlatform job configuration. For that, I use the use case Index Count.

Regular Java

First, you need to implement a main java:

package com.thales.services.cloudomc.punchplatform.analytics.example;

import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.sql.SparkSession;
import org.elasticsearch.spark.rdd.api.java.JavaEsSpark;

public class Main {

        public static void main(String[] args) {

                // Get or create the spark session object. It's your link with the spark cluster.
                SparkSession sparkSession = SparkSession.builder().appName("example").getOrCreate();;

                // Create a spark context from the spark session. It's the old version of spark api
                // used by elastic api.
                JavaSparkContext jsc = JavaSparkContext.fromSparkContext(sparkSession.sparkContext());

                // Get the index name given by argument
                String index = args[0];

                // Use elastic api to get documents
                JavaPairRDD<String, String> data = JavaEsSpark.esJsonRDD(jsc, index);

                // Count documents
                Long count = data.count();

                // Print count
                System.out.println("Number of documents in index: " + count);

        }

}

Next, you compile it inside a jar and you submit it to spark:

./spark-submit –deploy-mode cluster –master spark://localhost:7077 –class com.thales.services.cloudomc.punchplatform.analytics.example.Main punchplatform-analytics-example-4.0.1-SNAPSHOT-jar-with-dependencies.jar metricbeat-2018.01.19

You can test this command. The compiled jar is provided in folder “punchplatform-standalone-###/external/spark-2.2.1-bin-hadoop2.7/punchplatform/analytics/job/example”.

PunchPlatform Job

[
        {
                "type": "elastic_batch_input",
                "component": "input",
                "settings": {
                        "index": "metricbeat-2018.01.19",
                        "cluster_name": "es_search",
                        "nodes": [
                                "localhost"
                        ]
                },
                "publish": [
                        {
                                "field": "data"
                        }
                ]
        },
        {
                "type": "count",
                "component": "count",
                "subscribe": [
                        {
                                "component": "input",
                                "field": "data"
                        }
                ],
                "publish": [
                        {
                                "field": "value"
                        }
                ]
        },
        {
                "type": "print",
                "component": "print",
                "settings": {
                        "title": "COUNT"
                },
                "subscribe": [
                        {
                                "component": "count",
                                "field": "value"
                        }
                ]
        }
]