Skip to content

Track 2 Pyspark Node Development

Abstract

This track explains how you can code your own python custom node.

Checkout the conf/training/aim/track2 folder. All files referenced in this chapter are located in that folder. First checkout carefully the README.md file.

Dependency Management

Read the Depedency Management Guide to understand ths issues at stakes.

Node Execution

You can use the python editor of your choice but we recommand the pycharm IDE.

Use the punchpkg tool to package and deploy your nodes onto your loc al standalone platform. Refer to the PunchPkg Section

Node Settings

Checkout the full_job.punchline punchline configuration file.

{
    name: helloworld
    channel: default
    version: "6.0"
    tenant: default
    runtime: pyspark
    dag: [
        {
            type: complex_algorithm
            component: step1
            publish: [
                {
                    stream: data
                }
            ]
            settings: {
                param1: "{{my_date}}"
            }
        }
        {
            type: python_show
            component: just_print_to_stdout
            subscribe: [
                {
                    component: step1
                    stream: data
                }
            ]
            settings: {
            }
        }
    ]
    settings: {
        spark.additional.pex: complex_algorithm_dependencies.pex
    }
}