Track 2 Pyspark Node Development¶

Abstract

This track explains how you can code your own python custom node.

Checkout the $PUNCHPLATFORM_CONF_DIR/training/aim/track2 folder. All files referenced in this chapter are located in that folder. First read carefully the README.md file.

Dependency Management¶

Read the Depedency Management Guide to understand the issues at stakes.

Development¶

You can use the IDE (or text editor) of your choice but we recommend the PyCharm IDE.

Use the punchpkg tool to package and deploy your nodes on your local standalone platform. Refer to the PunchPkg Section

Example¶

Working directory

cd $PUNCHPLATFORM_CONF_DIR/training/aim/track2

Prerequisites¶

standalone installed: https://punchplatform.com

Try it out !¶

A Makefile is put at your disposal, use it to clean, lint and format your custom node code !

# check lint and code formatting for module algorithms
make inspect path=algorithms/

# clean unwanted .pyc if any
make clean

Note that we will be using punchpkg...

# Use punchpkg pyspark info to get install dir of pyspark

    >   punchpkg pyspark info

# Let's try running our template_node

# To begin, we will make our node available to our shells and editor

    >  eval "$(_PUNCHPKG_COMPLETE=source punchpkg)"     # for auto completion
    > punchpkg pyspark link-external-nodes $(pwd)     #  note: pwd here is rootdir of this README.txt
    > punchpkg pyspark list-external-nodes    # check if node was linked properly
    > punchpkg pyspark install-dependencies $(pwd)/complex_algorithm_dependencies   # install custom dependencies needed by your module (note: if the given module is not available on PyPI, please convert your module to PEX and use the same command on your PEX file !)
    > punchlinectl start -p $(pwd)/full_job.yaml

--[[
__________                    .__    .____    .__               
\______   \__ __  ____   ____ |  |__ |    |   |__| ____   ____  
 |     ___/  |  \/    \_/ ___\|  |  \|    |   |  |/    \_/ __ \ 
 |    |   |  |  /   |  \  \___|   Y  \    |___|  |   |  \  ___/ 
 |____|   |____/|___|  /\___  >___|  /_______ \__|___|  /\___  >
                     \/     \/     \/        \/       \/     \/ 
--]]
____   ________ _________  _ 
|__]\_/ [__ |__]|__||__/|_/  
|    |  ___]|   |  ||  \| \_ 


using nodes from ./nodes sources
Hello punch

Execution took 0.18007254600524902 seconds

Let's try for now to add some autocompletion to our favorite IDE

# Grab our punchline_python.whl file and install it using pip install in a virtualenv
# Note when using pip install some_modules. Be sure to track added modules in a seperate file.
# i.e don't mix our installed dependencies with your since this would generate big PEX files...

Coding/deploying your custom node¶

Follow our development guide

Making your node available to our environment¶

# In case your node uses some custom modules like: pandas
# You should provide a text file named as your module. 
# The text file should include only the custom modules your node is using
punchpkg pyspark install full/path/to/text_file/custom_modules

>   punchpkg pyspark install complex_algorithm_dependencies

# Check if your custom module is properly installed
# A json document will be outputted on stdout, search for the key custom_pex_dependencies
# Within this key, you will see custom_modules
punchpkg pyspark list-dependencies

# Check the current module
punchpkg pyspark info

# Installing your custom node from full path (use tab for autocompletion)
punchpkg pyspark install </tab></tab>

>   punchpkg pyspark install $(pwd)/algorithms

# List installed nodes
punchpkg pyspark list-nodes

# Executing a node
# either use our PL editor or use our shell punchlinectl
punchlinectl start -p full/path/to/job.punchline

>   punchlinectl start -p full_job.punchline -v


--[[
__________                    .__    .____    .__               
\______   \__ __  ____   ____ |  |__ |    |   |__| ____   ____  
 |     ___/  |  \/    \_/ ___\|  |  \|    |   |  |/    \_/ __ \ 
 |    |   |  |  /   |  \  \___|   Y  \    |___|  |   |  \  ___/ 
 |____|   |____/|___|  /\___  >___|  /_______ \__|___|  /\___  >
                     \/     \/     \/        \/       \/     \/ 
--]]
____   ________ _________  _ 
|__]\_/ [__ |__]|__||__/|_/  
|    |  ___]|   |  ||  \| \_ 


using nodes from ./nodes sources
Hello punch

Execution took 0.18007254600524902 seconds