Skip to content

PunchBolt

A Punch bolt executes a punchlet on the fly. This bolt cannot be used to communicate with an external application, it necessarily is internal to a topology. Its configuration looks like this :

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
{
  "type": "punch_bolt",
  "bolt_settings": {
    "punchlet": [
        "standard/common/input.punch",
        "standard/common/parsing_syslog_header.punch"
    ]
  },
  "storm_settings": {
    "component": "my_punch_bolt",
    "subscribe": [
      {
        "component": "kafka_spout",
        "stream": "logs"
      }
    ],
    "publish": [
      {
        "stream": "logs",
        "fields": [
          "log", "_ppf_id", "_ppf_timestamp"
        ]
      },
      {
        "stream": "_ppf_errors",
        "fields": [
          "_ppf_id", "_ppf_error_document", "_ppf_error_message", "_ppf_timestamp"
        ]
      }
    ]
  }
}

The key concept to understand it the relationship between the punchlet and the subscribed and published stream/field. In this example your punchlet will receive punch tuple corresponding to the storm tuple received on the stream. After it returns your punchlet will generate an arbitrary punch tuple (it depends on the punchlet logic). But the punchbolt will only emit these tuples/fields that match the publish section.

Info

make sure you understand spouts and bolts stream and field fundamental concepts.

The _ppf_errors stream and its fields is explained below in the dedicated error handling section.

Punchlets

The punchlet property refer to the punchlet you want to execute in the bolt. This property is a path, relative to the $PUNCHPLATFORM_CONF_DIR/resources/punch folder by default.

Some punchlet require resource files, typically when they use the findByKey or findByInterval Punch operator.
Others use Siddhi rule that must equivalently be loaded. To add resource files to your punchlet proceed as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
{
  "type": "punch_bolt",
  "bolt_settings": {
    "punchlet_json_resources": [
      "standard/apache_httpd/http_codes.json",
      "standard/apache_httpd/taxonomy.json"
    ],
    "punchlet_rule_resources": [
      "standard/common/detection.rule"
    ],
    "punchlet_grok_pattern_dirs": [
      "%{channel}%/patterns"
    ],
    "punchlet": [
      "standard/common/input.punch",
      "standard/common/parsing_syslog_header.punch",
      "standard/apache_httpd/parsing.punch",
      "standard/apache_httpd/enrichment.punch",
      "standard/apache_httpd/normalization.punch",
      "standard/common/geoip.punch"
    ]
  }
}
  • punchlet_json_resources (list): the required json files resources.
  • punchlet_grok_pattern_dirs (list): path of the grok pattern folders.

    All the files (not only with the *.grok extension) in these folders will be registered as grok pattern.

  • punchlet_rule_resources (list): the required rules files.

    All these will be loaded prior to the punchlet execution. These property must contain paths relative to the $PUNCHPLATFORM_CONF_DIR/resources/punch folder.

Placeholder only work if the tenant attribute is filled in the topology file.

You can also store these files relative to the configuration directory, tenant directory or channel directory by using (resp.) the %{conf}%, %{tenant}% or %{channel}% placeholder. Note that the %{conf}% is the one used by default so %{conf}%/standard/common/input.punch is equivalent to standard/common/input.punch. Also note that punchlet can be added using their absolute path, any path started with a / is considered as absolute.

For example if your resource file (say) taxonomy.json is located under :

1
$PUNCHPLATFORM_CONF_DIR/tenants/mytenant/channels/apache/resources/taxonomy.json

Here is the configuration you must use :

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
{
  "type": "punch_bolt",
  "bolt_settings": {
    "punchlet_json_resources": [
      "%{channel}%/resources/taxonomy.json"
    ],
    "punchlet": [
      "%{conf}%/standard/common/input.punch",
      ...
    ]
  },
  ...
}

Warning

Use absolute path with caution, preferably only for development, since these files will likely not be stored in your platform configuration directory.

Error Handling

A punchlet can raise an exception. Either explicitly or because it encounters a runtime error. Most often you cannot afford to loose the input data and must arrange to get it back together with the exception information and forward it to a backend for later reprocessing. Doing that on the punchplatform is quite easy. Simply add an aditional publish stream to indicate to the bolt to emit the error information in the topology:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
{
  "type": "punch_bolt",
  "bolt_settings": {
    ...
  },
  "storm_settings": {
    "subscribe": [
      ...
    ],
    "publish": [
      {
        "stream": "logs",
        "fields": [
          "log",
          "_ppf_id",
          "_ppf_timestamp"
        ]
      },
      {
        "stream": "_ppf_errors",
        "fields": [
          "_ppf_id",
          "_ppf_error_document",
          "_ppf_error_message",
          "_ppf_timestamp"
        ]
      }
    ]
  }
}

The _ppf_errors stream and _ppf_error_document, _ppf_error_message fields are reserved. What this cause is the emitting of a single-field storm tuple in the topology, that you can handle the same way you handle regular data. It basically contain the exception message (taht include the input data). Because it is emitted just like any other data, you can arrange to have it forward up to the final destination you need to save it and reprocess it later : archiving, elasticsearch or any other.

Info

the generated error field is a ready to use json document. Most often you simply need to forward it to save it somewhere. If you would like to enrich or normalise its content in some ways, simply deploy a punchlet bolt that subscribes to it. Your punchlet will then be capable of changing its content. But in turn that punchlet should not fail. Bottom line : do this only if strictly required and if so pay extra attention to write a error handling punchlet that can never fail.

Additional fields can be published in error stream, that can be either copied from the input stream (any field name is supported, as long at it is present the subscribed stream), or generated by the PunchBolt :

  • _ppf_timestamp : the standard input timestamp (long number of milliseconds since 1/1/1970)
  • _ppf_id : the unique id (string) of the input document
  • _ppf_platform : the unique id of the punchplatform instance
  • _ppf_tenant : the tenant name of the current channel
  • _ppf_channel : the name of the channel containing the failed punchlet
  • _ppf_topology : the name of the topology containing the failed punchlet
  • _ppf_component : the component name of the PunchBolt containing the failed punchlet
  • _ppf_error_message : the exception message or class that the punchlet raised at failure time.
  • _ppf_error_document : the JSON-escaped string document found when the punchlet fail occured

More than one Punchlet

It is extremaly common the deploy a straight sequence of punchlets. You can do that using one punch bolt per punchlet. But a more efficient solution is to deploy all the punchlets as a sequence in the same punch bolt. This avoid extra serialisation between bolts and can save up considerable cpu resources. The way you do this is very simple :

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
{
  "type": "punch_bolt",
  "bolt_settings": {
    "punchlets": [
      "standard/common/input.punch",
      "standard/common/parsing_syslog_header.punch",
      "standard/sourcefire/parsing.punch",
      "standard/common/geoip.punch"
    ]
  },
  ...
}

if one of the punchlets in there raise an exception, the following ones are skipped.

Latency Tracking

Just like all bolts, you can subscribe to and publish the _ppf_metrics stream and _ppf_latency field to make your bolt part of the latency tracking path. Here is an example.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
{
  "type": "punch_bolt",
  "bolt_settings": {
    ...
  },
  "storm_settings": {
    "subscribe": [
      {
        "component": "kafka_spout",
        "stream": "logs"
      },
      {
        "component": "kafka_spout",
        "stream": "_ppf_metrics"
      }
    ],
    "publish": [
      {
        "stream": "logs",
        ...
      },
      {
        "stream": "_ppf_errors",
        ...
      },
      {
        "stream": "_ppf_metrics",
        "fields": [
          "_ppf_latency"
        ]
      }
    ]
  }
}

Info

the only special thing about _ppf_metrics stream and _ppf_latency field is that they do not traverse your punchlets. You do not have to explicitly protect your punchlet code logic to ignore these. Make sure you understand spouts and bolts stream and field fundamental concepts.

TroubleShooting and Debugging

If you write punchlet, make sure you are aware of the many resources to easily test them. Check the punch language documentation chapters.

A good trick to know should you have issues with stream/fields being not emitted the way you think is to add a small punchlet in the chain that simply prints out and forward the received data. That is easy by deploying an inline punchlet in a bolt you add to your topology :

1
2
3
4
5
6
7
8
9
{
    "type": "punch_bolt",
    "bolt_settings": {
        "punchlet_code": "{ print(root); }"
    },
    "storm_settings": {
        ...
    }
}

Make sure you are aware of the following punch features, they dramatically ease your punchlet coding user experience :

  • inline punchlets
  • sublime text editor
  • punch log injector
  • the kibana inline plugin to execute grok and punchlets
  • punchplatform-topology.sh
  • topologies log levels