UDF: User Defined Function¶
Let's try enriching our data with our own UDF function
yearToMonth is a function that takes as input one parameter of type Integer and returns an Integer
SELECT yearToMonth(age) AS num_years, * FROM people_dataset
You can view UDFs in spark's ecosystem as a means to simplify data processing or data enrichment !
In general, UDFs are functions that takes a given number of parameters. Those parameters can either be multiple(s) column(s) and/or constant variables which can be used as options in your UDF code... UDFs returns only a single column that follows spark data types. Since spark's data types supports nested data structures, you can still output multiples coumns inside a single one ! Later on, you can use some of the built-in SQL functions to explode the nested result as multiple columns !
In case you want to have a look of the built-in API packaged within Spark refer to: Built-in-Functions
Refer to Here
Follow this link here
Developing your custom UDF and installing it !¶
To develop your own UDF we provide a starter-kit maven project.
Feel free to use it: UDF maven project starter-kit
After building the package, you can follow the installation guide: Installation Guide