Skip to content

HOWTO add my python node

Why do that

A major strength of pml-pyspark is to allow you to create your own nodes that match your specific needs, this section describe how to add a custom node to your standalone.

For user working on the git repository: the global maven installation of all the repos automatically add your changes to the new standalone.

It is also possible to manually update an installed standalone with the zip in the target folder of the repo, If you don't have a target folder, you need to run mvn clean install at least once in the repo. to upgrade, copy the zip to the standalone pyspark folder (replace the old one), and unzip. Finally, copy the .venv of the repo to the freshly unzipped pyspark folder in your standalone


You need a punchplatform-standalone installed with pyspark (use --with-pyspark argument when running the installation script for the standalone, or use the graphical interface and tick pyspark).

What to do

Once your standalone is installed, the pyspark package containing all nodes is located at : my-standalone-dir/external/puncplatform-pyspark-x.y.z/

All the following shell commands suggest that your are at the root of the pyspark folder.

it should contain at least theses items :

$ ls -a
.   elasticsearch-hadoop-6.8.2.jar  requirements.txt
..                       python-deps               .venv

The files that interest us are and requirements.txt.


The workflows for adding a new node will be simplified in Dave release. We will providing cli to automatically import your dependencies and generate the needed files.

Step 1 : Backup

A little precaution that can save big mistakes : backup the files.

Additionally, its not recommended to modify an existing node since update might happen in the future. This could cause breaking changes in your PML pipeline.

Step 2 : Unzip

This will create two folder : nodes and shared : - nodes contain the nodes (obviously) - shared contains some files that implement generic methods which can be used by the nodes. - core contains the backend engine files

Step 3.1 : Add your file

To add a node, just copy the python file containing your node to the previously mentioned nodes folder.

Step 3.2 (optional) : Add additional libraries

You might want to use different python libraries that the one included in the standalone : the list of packaged libraries are in the requirement.txt file. If your node import a new library not included, you need to add the library :

  • Either add it manually with pip install (don't forget to source the .venv)

  • Or add it to the requirement.txt, you can specific the version with the syntax package-name==version, or just the package name as you would install it with pip. Then run the following :

source .venv/bin/activate
pip install -U -r requirements.txt

Step 4 : Re-zip and run

Once the node is fully functional, its time te repack it in a new :

zip -r nodes shared

Don't forget the shared folder

Now you can execute your pml as if you were using any punch example (no need to source the .venv) :

# Use either
RELEASE="pyspark_5x"  (for craig)
# or
RELEASE="pyspark"    (for dave)
# then
punchlinectl --punchline /path/to/new/example.pml --runtime-environment $RELEASE

This script behave like the, both might be merged together in the future