PunchBolt¶
A Punch bolt
executes a
punchlet on the fly. This bolt cannot be used to communicate with an
external application, it necessarily is internal to a topology. Its
configuration looks like this :
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | { "type": "punch_bolt", "bolt_settings": { "punchlet": "standard/Apache_HTTP_Server/apache.punch" }, "storm_settings": { "component": "my_punch_bolt", "subscribe": [ { "component": "kafka_spout", "stream": "logs", "grouping": "localOrShuffle" } ], "publish": [ { "stream": "logs", "fields": [ "log", "local_uuid", "local_timestamp" ] }, { "stream": "_ppf_errors", "fields": [ "_ppf_error" ] } ] } } |
The key concept to understand it the relationship between the punchlet and the subscribed and published stream/field. In this example your punchlet will receive punch tuple corresponding to the storm tuple received on the stream. After it returns your punchlet will generate an arbitrary punch tuple (it depends on the punchlet logic). But the punchbolt will only emit these tuples/fields that match the publish section.
The _ppf_errors
stream is explained below, it has to do
with the error handling.
Punchlets¶
The punchlet
property refer to the punchlet you want to execute in the bolt.
This property is a path, relative to the $PUNCHPLATFORM_CONF_DIR/resources/punch/punchlet folder.
Some punchlet require resource files, typically when they use the findByKey
or findByInterval
Punch operator.
Others use Siddhi rule that must equivalently be loaded. To add resource files to your punchlet proceed as follows:
1 2 3 4 5 6 7 8 9 10 11 12 | { "type" : "punch_bolt", "bolt_settings" : { "punchlet_json_resources" : [ "standard/Apache_HTTP_Server/enrichement.json" ], "punchlet_rule_resources" : [ "standard/common/detection.rule" ], "punchlet" : "standard/Apache_HTTP_Server/enrichement.punch" } } |
- The
punchlet_json_resources
property lists the required json files. - The
punchlet_rule_resources
lists the rules.
All these will be loaded prior to the punchlet execution. These property must contain paths relative to the $PUNCHPLATFORM_CONF_DIR/resources/punch folder.
You can also store these files relative to the configuration directory, tenant directory or channel directory
by using (resp.) the %{conf}%
, %{tenant}%
or %{channel}%
placeholder. For example if your resource file (say) taxonomy.json is located under :
Placeholder only work if the tenant
attribute is filled in the topology file.
1 | $PUNCHPLATFORM_CONF_DIR/tenants/mytenant/channels/apache/resources/taxonomy.json
|
Here is the configuration you must use :
1 2 3 4 5 6 7 8 9 | { "type" : "punch_bolt", "bolt_settings" : { "punchlet_json_resources" : [ "%{channel}%/resources/taxonomy.json" ], "punchlet" : "standard/Apache_HTTP_Server/enrichement.punch" } } |
Last, you can also use absolute path starting by '/'.
Warning
Use this with caution, preferably only for development, since these files will likely not be stored in your git configuration directory. I.e. not be properly managed by the PunchPlatform configuration management.
Error Handling¶
A punchlet can raise an exception. Either explicitly or because it encounters a runtime error. Most often you cannot afford to loose the input data and must arrange to get it back together with the exception information and forward it to a backend for later reprocessing. Doing that on the punchplatform is quite easy. Simply add an aditional publish stream to indicate to the bolt to emit the error information in the topology:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | { "type" : "punch_bolt", "bolt_settings" : { "punchlet" : "standard/Apache_HTTP_Server/apache.punch" }, "storm_settings" : { "subscribe" : [ { "component" : "kafka_spout", "stream" : "logs", "grouping": "localOrShuffle" } ], "publish" : [ { "stream" : "logs", "fields" : ["log", "local_uuid", "local_timestamp"] }, { "stream" : "_ppf_errors", "fields" : ["_ppf_error_document"] } ] } } |
The _ppf_errors
stream and _ppf_error_document
field are reserved. What this cause is the emitting of a single-field
storm tuple in the topology, that you can handle the same way you handle
regular data. It basically contain the exception message (taht include
the input data). Because it is emitted just like any other data, you can
arrange to have it forward up to the final destination you need to save
it and reprocess it later : archiving, elasticsearch or any other.
Info
the generated error field is a ready to use json document. Most often you simply need to forward it to save it somewhere. If you would like to enrich or normalise its content in some ways, simply deploy a punchlet bolt that subscribes to it. Your punchlet will then be capable of changing its content. But in turn that punchlet should not fail. Bottom line : do this only if strictly required and if so pay extra attention to write a error handling punchlet that can never fail.
Additional fields can be published in error stream, that can be either copied from the input stream (any field name is supported, as long at it is present the subscribed stream), or generated by the PunchBolt :
_ppf_timestamp
: the standard input timestamp (long number of milliseconds since 1/1/1970)_ppf_error_message
: the exception message or class that the punchlet raised at failure time._ppf_id
: the unique id (string) of the input document_ppf_platform
: the unique id of the punchplatform instance_ppf_tenant
: the tenant name of the current channel_ppf_channel
: the name of the channel containing the failed punchlet_ppf_topology
: the name of the topology containing the failed punchlet_ppf_component
: the component name of the PunchBolt containing the failed punchlet_ppf_error
: the json document at start of the failed punchlet step
More than one Punchlet¶
It is extremaly common the deploy a straight sequence of punchlets. You can do that using one punch bolt per punchlet. But a more efficient solution is to deploy all the punchlets as a sequence in the same punch bolt. This avoid extra serialisation between bolts and can save up considerable cpu resources. The way you do this is very simple :
1 2 3 4 5 6 7 8 9 10 11 12 | { "type" : "punch_bolt", "bolt_settings" : { "punchlets" : [ "standard/common/input.punch", "standard/common/parsing_syslog_header.punch", "standard/sourcefire/parsing.punch", "standard/common/geoip.punch" ] }, ... } |
if one of the punchlets in there raise an exception, the following ones are skipped.
Latency Tracking¶
Just like all bolts, you can subscribe to and publish the
_ppf_metrics
and _ppf_latency
tuples to make your bolt
part of the latency tracking path. Here is an example.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | { "type" : "punch_bolt", "bolt_settings" : { "punchlet" : "standard/Apache_HTTP_Server/apache.punch" }, "storm_settings" : { "subscribe" : [ { "component" : "kafka_spout", "stream" : "logs", "grouping": "localOrShuffle" }, { "component" : "kafka_spout", "stream" : "_ppf_metrics", "grouping": "localOrShuffle" } ], "publish" : [ { "stream" : "logs", "fields" : ["log", "local_uuid", "local_timestamp"] }, { "stream" : "_ppf_errors", "fields" : ["_ppf_error"] }, { "stream" : "_ppf_metrics", "fields" : ["_ppf_latency"] } ] } } |
Info
the only special thing about _ppf_metrics
and _ppf_latency
tuples is that they do not
traverse your punchlets. You do not have to explicitly protect your
punchlet code logic to ignore these. Make sure you understand
spouts
and bolts
stream and field fundamental concepts.
TroubleShooting and Debugging¶
If you write punchlet, make sure you are aware of the many resources to easily test them. Check the punch language documentation chapters.
A good trick to know should you have issues with stream/fields being not emitted the way you think is to add a small punchlet in the chain that simply prints out and forward the received data. That is easy by deploying an inline punchlet in a bolt you add to your topology :
1 2 3 4 5 6 7 8 9 | { "type": "punch_bolt", "bolt_settings": { "punchlet_code": "{ print(root); }" }, "storm_settings": { ... } } |
Make sure you are aware of the following punch features, they dramatically ease your punchlet coding user experience :
- inline punchlets
- sublime text editor
- punch log injector
- the kibana inline plugin to execute grok and punchlets
- punchplatform-topology.sh
- topologies log levels