Storm & Punchlets

This chapter explain how to run punchlets in a Storm topology. Punchlets can be embedded in the PunchPlatform spouts or bolts. In particular a dedicated bolt is provided : the Punch bolt.

Using punchlets, you can do three things on the traversing data:

  1. transform: i.e. Parsing, enriching.
  2. route: towards a next bolt, a next third-party application such as Kafka or Elasticsearch
  3. filter data: drop some data from its content
  4. generate data: generate alert or correlation events

In the rest of this chapter we summarizes the configuration required to run your punchlet.

Basics

The simplest configuration looks like this:

1
2
3
4
5
6
7
  {
       "type" : "...",
       "spout_or_bolt_settings" : {
           "punchlet" : "standard/common/input.punch"
       },
       "storm_settings" : { ... }
   }

The punchlet property refers to a punchlet file. It will be looked for in the $PUNCHPLATFORM_CONF_DIR/resources/punch directory. That is the absolute path is :

$PUNCHPLATFORM_CONF_DIR/resources/punch/standard/common/input.punch

This path lookup usage is the one used to load a punchlet, as well as its (potential) companion resource files. Alternatively, you can also give an absolute path, starting with a ‘/’ character.

Grok Patterns

Punch comes in with a grok operator, using standard grok patterns loaded from

$PUNCHPLATFORM_CONF_DIR/resources/punch/pattern

If you need to add pattern, you can add more pattern files in there. If you need your own patterns, making them only visible to some topologies, from some channels or tenants, you can put them elsewhere, and configure your topology as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
  {
       "type" : "...",
       "spout_or_bolt_settings" : {
           "punchlet_grok_pattern_dirs" : [
               "pattern",
               "myadditionalpatterns"
           ],
           "punchlet" : "mydirectory/mypunchlet.punch"
       },
       "storm_settings" : { ... }
   }

Note that:

  1. all patterns are loaded from all files found in the directories you define
  2. patterns are loaded in order, you can thus overload a common pattern by defining it in one of your own pattern file
  3. if you use this configuration and only want to add a few patterns, but still rely on the common set, you must explicitly add default (i.e. “pattern”).

JSON Resource Files

You can associate json resource files that are loaded and made available as Punch Tuple to your punchlet. This feature makes it very simple to perform normalisation or enrichement. Here is an example. Say you have a file named “intervals.json” with the following contents:

1
2
3
4
5
6
 [
  { "user" : "bob", "min" : 0, "max" : 10 },
  { "user" : ["ted", "bob"], "min" : 5, "max" : 7 },
  { "user" : "dimi", "min" : 11, "max" : 20 },
  { "user" : "flo", "min" : 8, "max" : 12 }
 ]

If you put that file somewhere in the $PUNCHPLATFORM_CONF_DIR/resources/punch directory, for example in $PUNCHPLATFORM_CONF_DIR/resources/punch/mydirectory/intervals.json. You can then refer to it in your punchlet as in the following example

Tuple result;
Tuple intervals = getResourceTuple("intervals");
findByInterval(intervals).on(6).into(result);

In this example you’ll get the followin in the result Tuple:

[
  { "user" : "bob", "min" : 0, "max" : 10 },
  { "user" : ["ted", "bob"], "min" : 5, "max" : 7 }
]

Note

the Punch findByKey and findByInterval operators require input Tuple loaded from such resource files. But you can use them in many other ways.

Coming back to the way you refer to a resource tuple in a punchlet, you simply use the short name of the file, without the json extension. I.e. getResourceTuple(“intervals”). You must however make the corresponding file loaded and available to your punchlet. You do that by configuring the running environment of your punchlet.

For example if you run a Storm topology, you need the following configuration :

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
  {
       "type" : "...",
       "punch_bolt" : {

           # by default json file path are relative to the $PUNCHPLATFORM_CONF_DIR/resources/punch
           "punchlet_json_resources" : [
               "mydirectory/intervals.json"
           ],

           # punchlets path are also relative to $PUNCHPLATFORM_CONF_DIR/resources/punch
           "punchlet" : "mydirectory/mypunchlet.punch"
       },
       "storm_settings" : { ... }
   }

Note

the json file paths in the topology configuration file are relative to the $PUNCHPLATFORM_CONF_DIR/resources/punch folder.

You can load resource files from other directories using the %{conf}, %{tenant} and %{channel} directives as follows.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
  {
       "type" : "...",
       "punch_bolt" : {

           # by default json file path are relative to the $PUNCHPLATFORM_CONF_DIR/resources/punch
           "punchlet_json_resources" : [
               # $PUNCHPLATFORM_CONF_DIR/resources/punch/mydirectory/intervals.json
               "mydirectory/intervals.json",

               # $PUNCHPLATFORM_CONF_DIR/mydirectory/intervals.json
               %{conf}%/"mydirectory/intervals.json",

               # $PUNCHPLATFORM_CONF_DIR/tenants/<this_tenant>/mydirectory/intervals.json
               %{tenant}%/"mydirectory/intervals.json",

               # $PUNCHPLATFORM_CONF_DIR/tenants/<this_tenant>/channel/<this_channel>/mydirectory/intervals.json
               %{channel}%/"mydirectory/intervals.json",

               # Absolute path are accepted but makes no sense on production environment
               # this is only useful for development
               "/tmp/intervals.json"
           ],

           # punchlets path are also relative to $PUNCHPLATFORM_CONF_DIR/resources/punch
           "punchlet" : "mydirectory/mypunchlet.punch"
       },
       "storm_settings" : { ... }
   }

Siddhi Rules

To run Siddhi rules you can either use the dedicated Cep Bolt, or embed your rule in a punchlet. Here is how you do the later. In both cases you must include your rule in your topology file, so that it is loaded and made available at runtime.

Say you have a Siddhi rule stored in a low_battery.rule. For example in :

$PUNCHPLATFORM_CONF_DIR/resources/siddhi/cep/low_battery.rule

It contains the rule you want to embed in your punchlet. For example :

//
// Generate an event whenever a battery level drops lower than some treshold
// For the sake of this example this rule performs the following:
//   - it considers only the last 5 events using the window builtin function
//   - it computes the battery average value using the builtin avg function
//   - it receives data on a stream "input"
//   - if the rule is matched it generates data on the stream "output"
//
define stream input (local_uuid string, battery float, device_id int);
@info(name = 'query') from input[battery < 0.2]#window.length(5) \
  select battery, local_uuid, avg(battery) \
  as battery_avg group by device_id \
  insert into output

You can then refer to your rule in a punchlet as follows (Refer to Siddhi Rules for details):

{
  ...
  siddhi("low_battery").send(input).into(output);

}

I.e. you use the short name of the file, without the rule extension. To make the corresponding rule loaded and available to your punchlet, add it to your storm topology file as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
  {
       "type" : "...",
       "punch_bolt" : {

           # Watchout by default rule file path are relative to the
           # $PUNCHPLATFORM_CONF_DIR/resources root directory
           "punchlet_rule_resources" : [
               "%{conf}%/siddhi/cep/low_battery.rule"
           ],

           # Watchout punchlet paths are relative to the
           # $PUNCHPLATFORM_CONF_DIR/resources/punch root
           "punchlet" : "mydirectory/mypunchlet.punch"
       },
       "storm_settings" : { ... }
   }

Warning

note tha the root directory for Siddhi rules and for punchlets are not the same.

Regular Resource Files

You can make regular text file visible to punchlets. You can use them in ways you can invent, for example to work with email templates.

1
 String template = getResource("mydirectory/mail.html")

For this to work you need:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
  {
       "type" : "...",
       "spout_or_bolt_settings" : {
           "punchlet_file_resources" : [
               "mydirectory/mail.html"
           ],
           "punchlet" : "mydirectory/mypunchlet.punch"
       },
       "storm_settings" : { ... }
   }

Property Resource Files

You can also load Java property files. All properties are maide available to your punchlet using the getProperties() method.

Properties props = getProperties();

For this to work you need the following topology configuration:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
  {
       "type" : "...",
       "spout_or_bolt_settings" : {
           "punchlet_property_resources" : [
               "mydirectory/properties"
           ],
           "punchlet" : "mydirectory/mypunchlet.punch"
       },
       "storm_settings" : { ... }
   }