Skip to content

Punch

The punch node lets you run a punchlet on incoming Dataset rows. Say you receive a dataset as follows:

1
2
3
|  column1 | column2 | column3 |
|  "hello" | true    | 17      |
 ...

The punchlet will receive a json document like this:

1
2
3
4
5
  {
    "column1" : "hello",
    "column2" : true,
    "column3" : 17
  }

The punchlet can produce one or several additional columns. Here is a punchlet that will simply add two additional columns:

1
2
3
4
{
  [column4] = [column1] + " world";
  [column5] = [column3] * 2;
}

This will produce the following rows

1
2
3
|  column1 | column2 | column3 | column4        | column5 |
|  "hello" | true    | 17      | "hello world"  | 34      |
 ...

To achieve this here is the node definition:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
  {
    type: punch
    component: punch
    settings:
    {
      punchlet_code: 
      '''
      {
        [column4] = [column1] + "world";
        [column5] = [column3] * 2;
      }
      '''
      output_columns: [
        {
          type: string
          field: column4
        }
        {
          type: integer
          field: column5
        }
      ]
    }
    subscribe:
    [
      {
        component: input
        stream: documents
      }
    ]
    publish:
    [
      {
        stream: documents
      }
    ]
  }

Selecting Input Columns

You can select only some of the input columns using the input_columns property as follows

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
  {
    type: punch
    component: punch
    settings:
    {
      punchlet_code:
      '''
      {
        [column4] = [column1] + "world";
        [column5] = [column3] * 2;
      }
      '''
      input_columns: [ column1 , column3 ]
      ...
    }

Generating Several Rows

Your punchlet can output an array of values instead of just a single json document. In that case, as many rows will be generated in the output dataset.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
      punchlet_code:
      '''
      {
        Tuple results;
        for (int i = 0; i < 3; i++) {
          Tuple tmp;
          tmp:[column4] = [column1] + i;
          tmp:[column5] = [column3] + i;
          results.append(tmp);
        }
        // this is a notation to overwrite the top level document
        root:/ = results;
      }
      '''
1
2
3
4
|  column1 | column2 | column3 | column4    | column5 |
|  "hello" | true    | 17      | "hello 0"  | 17      |
|  "hello" | true    | 17      | "hello 1"  | 18      |
|  "hello" | true    | 17      | "hello 2"  | 19      |

Resources

Output documents will be appended in the output dataset. For each input document, the punch can either provide an output document, or multiple output documents (by providing an array as root tuple).

You can also provide external resources by adding it in resources setting. Those resources are accessible in punchlet code through the Java function:

1
2
3
4
5
6
7
/**
 * Return a provided resource
 * @param resourceName name of the resource (subscription name or "resources" map key)
 * @param resourceTtype type of the resource
 * @return the resource
 */
public <T> T getResource(String resourceName, Class<T> resourceTtype)

Warning

You must use this node instead of punch_stage if you need to provide a resource from an other node during punchlet execution.

Here is an example which show how to provide resources to Punch node :

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
    {
        type: file_model_input
          component: model_loader
          settings:
          {
                   file_path: model.bin
          }
          publish: [
            {
              stream: model
            }
          ]
          subscribe: []
      }

    {      
        type: punch
        component: punch
        settings:
        {
            punchlet_code: 
            '''
            {   
                [base64] = [name].encodeBase64();
                [decade] = [age] % 10;
                [pipelineModel] = getResource("resource_1",PipelineModel.class);
                [my_resource] = getResource("resource_2",String.class);
                // Do something with my resources
                print(root);
            }
            '''
            output_columns: [
                {
                    type: string
                    field: base64
                }
                {
                    type: integer
                    field: decade
                }

            ]
            resources: {
                resource_1: model_loader_model
                resource_2 : hello
            }


        }
        subscribe:
        [
            {
                component: input
                stream: data
            }
            {
                component: model_loader
                stream: model
            }

        ]
        publish:
        [
            {
                stream: data
            }
        ]
    }

Info

As you can see, we use 2 different resources of two different types. The first one is a resource calculated within the job with the file_model_input node. To use this resource we have to set the value of resource to : component_stream (here : model_loader_model) and set the type return by the node file_model_input within the punchlet (here : PipelineModel.class). The second one is a constant resource of String type, you can define any type (Integer, String ..) in order to use it in your punchlet.

Settings

  • punchlet_code: String : "{}"

    Punchlet code. Override "punchlet_code_file".

  • punchlet_code_file: String

    Punchlet code file readable from driver

  • input_columns: [String]

    If not set, all the dataset row columns will be visible to the punchlet. You can specifically narrow the number of exposed columns by defining input_columns.

  • output_column: [Json]

    List of additional columns, i.e. the one added by the punchlet

  • resources: Json

    Map of resources provided during punchlet execution

Info

you can use the ''' special hjson tag to include punchlet code in a more readable and unescaped way. An example follows.