Skip to content

User Defined Functions

Instead of developing punchline nodes, a simpler solution is to provide additional spark user defined functions (UDFs). UserDefinedFunction are fully part of the spark ecosystem. They enable you to provide your own functions and make them visible from within a Spark SQL query.

The benefits of using UDF is to invoke them directly from Spark SQL statements. It is both very flexible and you benefit from Spark optimization capabilities. Refer to our udf reference guide documentation for details.

Once you have you UDF, use it directly through the punch Sql node. Refer to the punchpkg tool to package and deploy your udf jars or python modules, in turn making them available to the sql node.

Starter Kit

Refer to the online UDF starter-kit. After building the package, follow the installation Guide

That starter provides a very simple use-case to convert a string representation of an array into an array of string. Once you are familiar you can tackle various use case such as data enrichment, pattern matching, scientific computation.