HOWTO add your analytics algorithm

Why do that¶

You may want to add you own algorithm available as part of the Punchline sdk. Doing that will allow users to add a node or stage in their Punchline configuration file to leverage your algorithm.

Prerequisites¶

You need a punch-standalone installed with spark.

What to do¶

Implement the spark ML interface¶

First implements the spark machine-learning public interfaces:

Transformer: for a transformer
Estimator: for an estimator

There is no official spark documentation for this. But you can look at this O\'Reilly page, and?or have a look at the already implemented ML algorithms code source.

Deploy Your Jar¶

Compile and package your algorithm using your favorite tool (maven, sbt, ...). Note that your jar must not embed the many spark libraries that are already shipped with the punchplatform. On a standalone the core spark jars are located under

# Copy built jar to
$PUNCHPLATFORM_INSTALL_DIR/extlib/spark/

Use the algorithm in a PML configuration¶

You can now refer to your algorithm in a pipeline_stage:

{
  version: "6.0"
  runtime: spark
  type: punchline
  tenant: mytenant
  dag: [...]
  settings: {
      spark.additional.jars: my_ml.jar
  }
}

and in your MlTransformer node:

{
    type: your.algorithm.Name
    settings: {
        # your ML parameters
    }
}