HOWTO add your analytics algorithm
Why do that¶
You may want to add you own algorithm available as part of the PML sdk. Doing that will allow users to add a node or stage in their PML configuration file to leverage your algorithm.
Prerequisites¶
You need a punchplatform-standalone installed with spark.
What to do¶
Implement the spark ML interface¶
First implements the spark machine-learning public interfaces:
- Transformer: for a transformer
- Estimator: for an estimator
There is no official spark documentation for this. But you can look at this O\'Reilly page, and?or have a look at the already implemented ML algorithms code source.
Deploy Your Jar¶
Compile and package your algorithm using your favorite tool (maven, sbt, ...). Note that your jar must not embed the many spark libraries that are already shipped with the punchplatform. On a standalone the core spark jars are located under
1 | $ punchplatform-standalone-*/external/spark-x.y.z-bin-hadoop2.7/jars |
And the one delivered as part of the punchplatform PML under:
1 | $ punchplatform-standalone-*/external/spark-x.y.z-bin-hadoop2.7/punchplatform/analytics/job/additional_jars |
Make sure yours as well as it dependencies (if any) not already part of the core spark jars are also
located in additional_jars
.
Use the algorithm in a PML configuration¶
You can now refer to your algorithm in a pipeline_stage
:
1 2 3 4 5 6 | { "type": "your.algorimth.Name", "settings": { # your algorithm parameters ... } } |