Punchlets inside Punchlines¶
This chapter explains how to run punchlets in a punchline. Punchlets can be embedded in the "punch node" that you can embed in streaming punchlines.
Using punchlets, you can take several actions on the traversing data:
- transform: parsing, enriching.
- route: towards a next bolt, a next third-party application such as Kafka or Elasticsearch
- filter data: drop some data from its content
- generate data: generate alert or correlation events
In the rest of this chapter we summarizes the configuration required to run your punchlet.
Basics¶
The simplest configuration looks like this:
{
"type" : punchlet_node",
"settings" : {
"punchlet" : "standard/common/input.punch"
},
"storm_settings" : { ... }
}
The "punchlet" property refers to a punchlet file. It will
be looked for in the $PUNCHPLATFORM_CONF_DIR/resources/punch
directory. For the example above, the relative path would became
$PUNCHPLATFORM_CONF_DIR/resources/punch/standard/common/input.punch
This path lookup usage is the one used to load a punchlet, as well as its (potential) companion resource files. Alternatively, you can also give an absolute path, starting with a '/' character.
To know available usages, see the Punch Bolt documentation section.
Grok Patterns¶
Punch comes in with a grok operator, using standard grok patterns loaded from
$PUNCHPLATFORM_CONF_DIR/resources/punch/pattern
If you need to add pattern, you can add more pattern files in there. If you need your own patterns, making them only visible to some topologies, from some channels or tenants, you can put them elsewhere, and configure your topology as follows:
{
"type" : punchlet_node",
"settings" : {
"punchlet_grok_pattern_dirs" : [
"pattern",
"myadditionalpatterns"
],
"punchlet" : "mydirectory/mypunchlet.punch"
},
"storm_settings" : { ... }
}
Note that:
- all patterns are loaded from all files found in the directories you define
- patterns are loaded in order, you can thus overload a common pattern by defining it in one of your own pattern file
- if you use this configuration and only want to add a few patterns, but still rely on the common set, you must explicitly add default (i.e. ).
JSON Resource Files¶
You can associate json resource files that are loaded and made available as Punch Tuple to your punchlet. This feature makes it very simple to perform normalisation or enrichment. Here is an example. Say you have a file named with the following contents:
[
{ "user" : "bob", "min" : 0, "max" : 10 },
{ "user" : ["ted", "bob"], "min" : 5, "max" : 7 },
{ "user" : "dimi", "min" : 11, "max" : 20 },
{ "user" : "flo", "min" : 8, "max" : 12 }
]
If you put that file somewhere in the $PUNCHPLATFORM _CONF _DIR/resources/punch
directory, for example in $PUNCHPLATFORM _CONF _DIR/resources/punch/mydirectory/intervals.json
.
You can then refer to it in your punchlet as in the following example
Tuple result;
Tuple intervals = getResourceTuple("intervals");
findByInterval(intervals).on(6).into(result);
In this example you 'll get the following in the result Tuple:
[
{ "user" : "bob", "min" : 0, "max" : 10 },
{ "user" : ["ted", "bob"], "min" : 5, "max" : 7 }
]
Note
the Punch findByKey and findByInterval operators require input Tuple loaded from such resource files. But you can use them in many other ways.
Coming back to the way you refer to a resource tuple in a punchlet, you simply use the short name of the file, without the '.json' extension. You must however make the corresponding file loaded and available to your punchlet. You do that by configuring the running environment of your punchlet.
For example if you run a Storm topology, you need the following configuration :
{
"type": "punchlet_node",
"settings": {
# by default json file path are relative to the
# $PUNCHPLATFORM_CONF_DIR/resources/punch
"punchlet_json_resources": [
"mydirectory/intervals.json"
],
# punchlets path are also relative to
# $PUNCHPLATFORM_CONF_DIR/resources/punch
"punchlet": "mydirectory/mypunchlet.punch"
},
"storm_settings": {
"component": punchlet_node",
"subscribe": [
...
],
"publish": [
...
]
}
}
Note
the json file paths in the topology configuration file are relative to the $PUNCHPLATFORM_CONF_DIR/resources/punch
folder.
You can load resource files from other directories using the %{conf}%
, %{tenant}%
and %{channel}%
directives as follows.
{
"type": "punchlet_node",
"settings": {
"punchlet_json_resources": [
# by default json file path are relative to the $PUNCHPLATFORM_CONF_DIR/resources/punch
# this one would be located at:
# $PUNCHPLATFORM_CONF_DIR/resources/punch/mydirectory/intervals.json
"mydirectory/intervals.json",
# $PUNCHPLATFORM_CONF_DIR/mydirectory/intervals.json
"%{conf}%/mydirectory/intervals.json",
# $PUNCHPLATFORM_CONF_DIR/tenants/<this_tenant>/mydirectory/intervals.json
"%{tenant}%/mydirectory/intervals.json",
# $PUNCHPLATFORM_CONF_DIR/tenants/<this_tenant>/channel/<this_channel>/mydirectory/intervals.json
"%{channel}%/mydirectory/intervals.json",
# Absolute path are accepted but makes no sense on production environment
# this is only useful for development
"/tmp/intervals.json"
],
# punchlets path are also relative to $PUNCHPLATFORM_CONF_DIR/resources/punch
"punchlet": "mydirectory/mypunchlet.punch"
},
"storm_settings": {
...
}
}
Regular Resource Files¶
You can make regular text file visible to punchlets. You can use them in ways you can invent, for example to work with email templates.
String template = getResource("mydirectory/mail.html")
For this to work you need:
{
"type": "punchlet_node",
"settings": {
"punchlet_file_resources": [
"mydirectory/mail.html"
],
"punchlet": "mydirectory/mypunchlet.punch"
},
"storm_settings": {
...
}
}
Property Resource Files¶
You can also load Java property files. All properties are made
available to your punchlet using the getProperties()
method.
Properties props = getProperties();
For this to work you need the following topology configuration:
{
"type": "punchlet_node",
"settings": {
"punchlet_property_resources": [
"mydirectory/properties"
],
"punchlet": "mydirectory/mypunchlet.punch"
},
"storm_settings": { ...
}
}