HOWTO add your analytics algorithm
Why do that¶
You may want to add you own algorithm available as part of the Punchline sdk. Doing that will allow users to add a node or stage in their Punchline configuration file to leverage your algorithm.
Prerequisites¶
You need a punch-standalone installed with spark.
What to do¶
Implement the spark ML interface¶
First implements the spark machine-learning public interfaces:
- Transformer: for a transformer
- Estimator: for an estimator
There is no official spark documentation for this. But you can look at this O\'Reilly page, and?or have a look at the already implemented ML algorithms code source.
Deploy Your Jar¶
Compile and package your algorithm using your favorite tool (maven, sbt, ...). Note that your jar must not embed the many spark libraries that are already shipped with the punchplatform. On a standalone the core spark jars are located under
# Copy built jar to
$PUNCHPLATFORM_INSTALL_DIR/extlib/spark/
Use the algorithm in a PML configuration¶
You can now refer to your algorithm in a pipeline_stage
:
{
version: "6.0"
runtime: spark
type: punchline
tenant: mytenant
dag: [...]
settings: {
spark.additional.jars: my_ml.jar
}
}
and in your MlTransformer node:
{
type: your.algorithm.Name
settings: {
# your ML parameters
}
}