Punch¶
The punch
node lets you run a punchlet on incoming Dataset rows. Say you receive a dataset as follows:
1 2 3 | | column1 | column2 | column3 | | "hello" | true | 17 | ... |
The punchlet will receive a json document like this:
1 2 3 4 5 | { "column1" : "hello", "column2" : true, "column3" : 17 } |
The punchlet can produce one or several additional columns. Here is a punchlet that will simply add two additional columns:
1 2 3 4 | { [column4] = [column1] + " world"; [column5] = [column3] * 2; } |
This will produce the following rows
1 2 3 | | column1 | column2 | column3 | column4 | column5 | | "hello" | true | 17 | "hello world" | 34 | ... |
To achieve this here is the node definition:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | { type: punch_batch component: punch settings: { punchlet_code: ''' { [column4] = [column1] + "world"; [column5] = [column3] * 2; } ''' output_columns: [ { type: string field: column4 } { type: integer field: column5 } ] } subscribe: [ { component: input stream: documents } ] publish: [ { stream: documents } ] } |
Selecting Input Columns¶
You can select only some of the input columns using the input_columns
property as follows
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | { type: punch_batch component: punch settings: { punchlet_code: ''' { [column4] = [column1] + "world"; [column5] = [column3] * 2; } ''' input_columns: [ column1 , column3 ] ... } |
Generating Several Rows¶
Your punchlet can output an array of values instead of just a single json document. In that case, as many rows will be generated in the output dataset.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | punchlet_code: ''' { Tuple results; for (int i = 0; i < 3; i++) { Tuple tmp; tmp:[column4] = [column1] + i; tmp:[column5] = [column3] + i; results.append(tmp); } // this is a notation to overwrite the top level document root:/ = results; } ''' |
1 2 3 4 | | column1 | column2 | column3 | column4 | column5 | | "hello" | true | 17 | "hello 0" | 17 | | "hello" | true | 17 | "hello 1" | 18 | | "hello" | true | 17 | "hello 2" | 19 | |
Resources¶
Output documents will be appended in the output dataset. For each input document, the punch can either provide an output document, or multiple output documents (by providing an array as root tuple).
You can also provide external resources by adding it in resources
setting or by named subscription tagged with resource
.
Those resources are accessible in punchlet code through the Java function:
1 2 3 4 5 6 7 | /** * Return a provided resource * @param resourceName name of the resource (subscription name or "resources" map key) * @param resourceTtype type of the resource * @return the resource */ public <T> T getResource(String resourceName, Class<T> resourceTtype) |
Warning
You must use this node instead of punch_stage
if you need to provide a resource from an other node during
punchlet execution.
Settings¶
punchlet_code
: String : "{}"Punchlet code. Override "punchlet_code_file".
punchlet_code_file
: StringPunchlet code file readable from driver
input_columns
: [String]If not set, all the dataset row columns will be visible to the punchlet. You can specifically narrow the number of exposed columns by defining input_columns.
output_column
: [Json]List of additional columns, i.e. the one added by the punchlet
resources
: JsonMap of resources provided during punchlet execution
Info
you can use the '''
special hjson tag to include punchlet code in a
more readable and unecaped way. An example follows.