User Defined Functions¶
Spark user defined function is an important and powerful feature. Let's see how you can leverage this in the punch. Given a people dataset as follows:
Say you want to add a columns with the age converted from years into months.
The proper way to do that is to add to spark a new function, call it for example
and invoke it from within SQL as follows:
SELECT yearToMonth(age) AS num_years, * FROM people_dataset
UDFs are functions that takes a number of parameters. Those parameters can either be multiple(s) column(s) and/or constant variables which can be used as options in your UDF code. UDFs returns only a single typed column.
If you need several columns, do it in two steps. Spark data types supports nested data structures, you can first generate a single column containing a nested structure. Next, use of the built-in SQL functions to explode the nested result as multiple columns !