Skip to content

HOWTO add your analytics algorithm

Why do that

You may want to add you own algorithm available as part of the PML sdk. Doing that will allow users to add a node or stage in their PML configuration file to leverage your algorithm.

Prerequisites

You need a punchplatform-standalone installed with spark.

What to do

Implement the spark ML interface

First implements the spark machine-learning public interfaces:

There is no official spark documentation for this. But you can look at this O\'Reilly page, and?or have a look at the already implemented ML algorithms code source.

Deploy Your Jar

Compile and package your algorithm using your favorite tool (maven, sbt, ...). Note that your jar must not embed the many spark libraries that are already shipped with the punchplatform. On a standalone the core spark jars are located under

1
$ punchplatform-standalone-*/external/spark-x.y.z-bin-hadoop2.7/jars

And the one delivered as part of the punchplatform PML under:

1
$ punchplatform-standalone-*/external/spark-x.y.z-bin-hadoop2.7/punchplatform/analytics/job/additional_jars

Make sure yours as well as it dependencies (if any) not already part of the core spark jars are also located in additional_jars.

Use the algorithm in a PML configuration

You can now refer to your algorithm in a pipeline_stage:

1
2
3
4
5
6
{
    "type": "your.algorimth.Name",
    "settings": {
        # your algorithm parameters ...
    }
}