HOWTO add your analytics node
Why do that¶
If you want to add your own (java or scala) node in a PunchPlatform Machine-Learning json configuration.
Prerequisites¶
You need a punchplatform-standalone installed with spark.
What to do¶
Implement the Node interface¶
You must implements the node interface provided by the punchplatform-job library. You can install this dependencie inside your local maven repository with the command:
1 | $ punchplatform-development.sh |
Note
This command exports the various platform jars in your local maven repository so that you can include dependencies in your maven projects. Of course if you are part of the punch community, you will directly work with the punch git repositories.
Add the punchplatform-job dependencie printed by this command to you maven module pom. You can now create a class implemented the interface :
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | import com.fasterxml.jackson.annotation.JsonCreator; import com.fasterxml.jackson.annotation.JsonProperty; import org.thales.punch.ml.configuration.JacksonConfigurationName; import org.thales.punch.ml.job.Input; import org.thales.punch.ml.job.Output; import org.thales.punch.ml.job.Node; import org.thales.punch.ml.configuration.NodeName; import org.thales.punch.ml.configuration.NodeType; import org.thales.punch.ml.configuration.NodeType.Type; @NodeName("your_node_name") @NodeType(Type.OUTPUT_NODE) public class YourNode implements Node { private static final long serialVersionUID = 1L; @JsonProperty(value = "param_1") public String param_1 = "default_param"; @JsonCreator public YourNode() { super(); } @Override public void execute(Input input, Output output) throws Exception { System.out.println(input.getSingleton().get()); } @Override public void declare(IDeclarer declarer) throws Exception { declarer.subscribeSingleton(new TypeReference<Dataset<Row>>() {}); } } |
Deploy your Jar¶
Compile your algorithm into a jar (with maven, sbt, ...). To make a lightweight jar, consider provided libraries from spark and PunchPlatform:
1 2 | $ punchplatform-standalone-*/external/spark-2.2.1-bin-hadoop2.7/jars $ punchplatform-standalone-*/external/spark-2.2.1-bin-hadoop2.7/punchplatform/analytics/job/additional_jars |
Use Your Node a PML configuration¶
You can now refer to your algorithm in a
pipeline_stage
:
1 2 3 4 5 6 7 8 9 10 11 12 13 | { type: your_node_name component: your_component settings: { param_1: hello } subscribe: [ { stream: input_stream component: input_component } ] } |