Skip to content

Punch Storm Integration

This chapter explains how to run punchlets in a Storm topology. Punchlets can be embedded in the PunchPlatform "punch_bolt".

Using punchlets, you can take several actions on the traversing data:

  1. transform: parsing, enriching.
  2. route: towards a next bolt, a next third-party application such as Kafka or Elasticsearch
  3. filter data: drop some data from its content
  4. generate data: generate alert or correlation events

In the rest of this chapter we summarizes the configuration required to run your punchlet.

Basics

The simplest configuration looks like this:

1
2
3
4
5
6
7
{
     "type" : "punch_bolt",
     "bolt_settings" : { 
         "punchlet" : "standard/common/input.punch"
     },
     "storm_settings" : { ... }
 }    

The "punchlet" property refers to a punchlet file. It will be looked for in the $PUNCHPLATFORM_CONF_DIR/resources/punch directory. For the example above, the relative path would became $PUNCHPLATFORM_CONF_DIR/resources/punch/standard/common/input.punch

This path lookup usage is the one used to load a punchlet, as well as its (potential) companion resource files. Alternatively, you can also give an absolute path, starting with a '/' character.

To know available usages, see the Punch Bolt documentation section.

Grok Patterns

Punch comes in with a grok operator, using standard grok patterns loaded from

1
$PUNCHPLATFORM_CONF_DIR/resources/punch/pattern

If you need to add pattern, you can add more pattern files in there. If you need your own patterns, making them only visible to some topologies, from some channels or tenants, you can put them elsewhere, and configure your topology as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
{
     "type" : "punch_bolt",
     "bolt_settings" : { 
         "punchlet_grok_pattern_dirs" : [
             "pattern",
             "myadditionalpatterns"
         ],
         "punchlet" : "mydirectory/mypunchlet.punch"
     },
     "storm_settings" : { ... }
 }    

Note that:

  1. all patterns are loaded from all files found in the directories you define
  2. patterns are loaded in order, you can thus overload a common pattern by defining it in one of your own pattern file
  3. if you use this configuration and only want to add a few patterns, but still rely on the common set, you must explicitly add default (i.e. ).

JSON Resource Files

You can associate json resource files that are loaded and made available as Punch Tuple to your punchlet. This feature makes it very simple to perform normalisation or enrichement. Here is an example. Say you have a file named with the following contents:

1
2
3
4
5
6
[
 { "user" : "bob", "min" : 0, "max" : 10 },
 { "user" : ["ted", "bob"], "min" : 5, "max" : 7 },
 { "user" : "dimi", "min" : 11, "max" : 20 },
 { "user" : "flo", "min" : 8, "max" : 12 }
]

If you put that file somewhere in the $PUNCHPLATFORM _CONF _DIR/resources/punch directory, for example in $PUNCHPLATFORM _CONF _DIR/resources/punch/mydirectory/intervals.json. You can then refer to it in your punchlet as in the following example

1
2
3
Tuple result;
Tuple intervals = getResourceTuple("intervals");
findByInterval(intervals).on(6).into(result);

In this example you 'll get the followin in the result Tuple:

1
2
3
4
[
  { "user" : "bob", "min" : 0, "max" : 10 },
  { "user" : ["ted", "bob"], "min" : 5, "max" : 7 }
]

Note

the Punch findByKey and findByInterval operators require input Tuple loaded from such resource files. But you can use them in many other ways.

Coming back to the way you refer to a resource tuple in a punchlet, you simply use the short name of the file, without the '.json' extension. You must however make the corresponding file loaded and available to your punchlet. You do that by configuring the running environment of your punchlet.

For example if you run a Storm topology, you need the following configuration :

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
{
  "type": "punch_bolt",
  "bolt_settings": {
    # by default json file path are relative to the 
    # $PUNCHPLATFORM_CONF_DIR/resources/punch 
    "punchlet_json_resources": [
      "mydirectory/intervals.json"
    ],

    # punchlets path are also relative to 
    # $PUNCHPLATFORM_CONF_DIR/resources/punch
    "punchlet": "mydirectory/mypunchlet.punch"
  },
  "storm_settings": {
    "component": "punch_bolt",
    "subscribe": [
        ...
    ],
    "publish": [
        ...
    ]
  }
}   

Note

the json file paths in the topology configuration file are relative to the $PUNCHPLATFORM_CONF_DIR/resources/punch folder.

You can load resource files from other directories using the %{conf}%, %{tenant}% and %{channel}% directives as follows.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
{
  "type": "punch_bolt",
  "bolt_settings": {

    "punchlet_json_resources": [
      # by default json file path are relative to the $PUNCHPLATFORM_CONF_DIR/resources/punch
      # this one would be located at:
      # $PUNCHPLATFORM_CONF_DIR/resources/punch/mydirectory/intervals.json
      "mydirectory/intervals.json", 

      # $PUNCHPLATFORM_CONF_DIR/mydirectory/intervals.json 
      "%{conf}%/mydirectory/intervals.json",

      # $PUNCHPLATFORM_CONF_DIR/tenants/<this_tenant>/mydirectory/intervals.json 
      "%{tenant}%/mydirectory/intervals.json",

      # $PUNCHPLATFORM_CONF_DIR/tenants/<this_tenant>/channel/<this_channel>/mydirectory/intervals.json 
      "%{channel}%/mydirectory/intervals.json",

      # Absolute path are accepted but makes no sense on production environment
      # this is only useful for development
      "/tmp/intervals.json"
    ],

    # punchlets path are also relative to $PUNCHPLATFORM_CONF_DIR/resources/punch
    "punchlet": "mydirectory/mypunchlet.punch"
  },
  "storm_settings": {
    ...
  }
}

Siddhi Rules

To run Siddhi rules you can either use the dedicated Cep Bolt, or embed your rule in a punchlet. Here is how you do the later. In both cases you must include your rule in your topology file, so that it is loaded and made available at runtime.

Say you have a Siddhi rule stored in a low _battery.rule. For example in :

1
$PUNCHPLATFORM_CONF_DIR/resources/siddhi/cep/low_battery.rule

It contains the rule you want to embed in your punchlet. For example :

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
//
// Generate an event whenever a battery level drops lower than some treshold
// For the sake of this example this rule performs the following:
//   - it considers only the last 5 events using the window builtin function
//   - it computes the battery average value using the builtin avg function
//   - it receives data on a stream "input"
//   - if the rule is matched it generates data on the stream "output"
//
define stream input (local_uuid string, battery float, device_id int); 
@info(name = 'query') from input[battery < 0.2]#window.length(5)  
  select battery, local_uuid, avg(battery)  
  as battery_avg group by device_id   
  insert into output

You can then refer to your rule in a punchlet as follows (Refer to CEP rules for details):

1
2
3
4
5
{
  ...
  siddhi("low_battery").send(input).into(output);  

} 

To make the corresponding rule loaded and available to your punchlet, add it to your storm topology file as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
{
  "type": "punch_bolt",
  "bolt_settings": {

    # Watchout by default rule file path are relative to the 
    # $PUNCHPLATFORM_CONF_DIR/resources root directory
    "punchlet_rule_resources" : [
      "%{conf}%/siddhi/cep/low_battery.rule"
    ],

    # Watchout punchlet paths are relative to the 
    # $PUNCHPLATFORM_CONF_DIR/resources/punch root
    "punchlet" : "mydirectory/mypunchlet.punch"
  },
  "storm_settings" : {
    ...
  }
}

The root directory for Siddhi rules and for punchlets are not the same.

Regular Resource Files

You can make regular text file visible to punchlets. You can use them in ways you can invent, for example to work with email templates.

1
String template = getResource("mydirectory/mail.html")

For this to work you need:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
{
  "type": "punch_bolt",
  "bolt_settings": {
    "punchlet_file_resources": [
      "mydirectory/mail.html"
    ],
    "punchlet": "mydirectory/mypunchlet.punch"
  },
  "storm_settings": {
     ...
  }
}   

Property Resource Files

You can also load Java property files. All properties are maide available to your punchlet using the getProperties() method.

1
Properties props = getProperties();

For this to work you need the following topology configuration:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
{
  "type": "punch_bolt",
  "bolt_settings": {
    "punchlet_property_resources": [
      "mydirectory/properties"
    ],
    "punchlet": "mydirectory/mypunchlet.punch"
  },
  "storm_settings": { ...
  }
}