Nifi Punch Processor¶
Abstract
This Nifi Processor allows to run punchlets on FlowFiles in Apache Nifi. See this blog for a brief introduction.
Warning
Check compatibility with your Nifi's version in the versioning section.
Before you start¶
First, you need to install Nifi. Download the latest release and simply extract the archive to the location that you wish to run the application from.
In the following instructions, Nifi will be assumed to be installed in your home directory.
Install PunchProcessor¶
You can install install the PunchProcessor the same way you would install any other custom processor in Nifi.
Download the latest release of the processor from Punch Download Page
and copy the nar
provided to Nifi's lib
folder before starting it, or add it to the extension folder of
a running Nifi to use auto-loading.
PunchProcessor should be available in your processors list with tags thales
, punch
and punchlet
.
Properties¶
Resources can be provided in 3 different ways :
- Inline code : the code resource is directly provided.
- Paths (files/dirs) : the code resource is extracted from files/dirs provided.
- Http urls : the code resource is provided from url response body.
As result, each type of resource has 2 properties : the input mode and the value.
Name | Type | Status | Default value | Description |
---|---|---|---|---|
Punchlets input mode | Options | Required | Paths (files/dirs) | Punchlets input mode. |
Punchlets | String | Required | - | Punchlets value. |
Grok patterns input mode | Options | Optional | Paths (files/dirs) | Grok patterns input mode. |
Grok patterns | String | Optional | - | Grok patterns value. |
Siddhi rules input mode | Options | Optional | Paths (files/dirs) | Siddhi rules input mode. |
Siddhi rules | String | Optional | - | Siddhi rules value. |
Json resources input mode | Options | Optional | Paths (files/dirs) | Json resources input mode. |
Json resources | String | Optional | - | Json resources value. |
With Attributes | Boolean | Optional | false | Make FlowFile attributes available to the Punchlet. |
Possible routes | String | Optional | - | Comma separated route to which FlowFiles can be sent. |
Group by route | Boolean | Optional | false | Group output tuples into a single FlowFile per route. |
Example :¶
With resources located in /home/user/nifi/standalone/, a configuration could look like :
Punchlets input mode : Inline Code
Punchlets : { print([content]); }
Grok patterns input mode : Paths (file/dirs)
Grok pattern : /home/user/nifi/standalone/resources/patterns
Siddhi rules : Paths (file/dirs)
Siddhi rules : /home/user/nifi/standalone/resources/standard/common/detection.rule
Json resources input mode : Paths (file/dirs)
Json ressources : /home/user/nifi/standalone/resources/standard/apache_httpd/http_codes.json,
/home/user/nifi/standalone/resources/standard/apache_httpd/taxonomy.json
With Attributes : false
Possible Routes : red,blue,green
Group by route : false
Validation¶
The punchlet is compiled at processor validation (before processor execution). If a compilation exception is raised, the processor validation is a failure and it can't be started.
Execution¶
Here's how the processor works :
- The input FlowFile content is parsed to extract each line.
- Each line is converted into a Tuple with two main keys :
content
: FlowFile content. If Json formatted, parsed and stored as Tuple. If raw string, stored as string.attributes
: Optional. FlowFile attributes. Stored as a key-value map.
- Each Tuple is processed using the punchlet.
- Result tuple is converted to Json string.
- If "Group by route" is :
false
: A new FlowFile is created for each processed line, and sent to specified route, or "default" if none is provided.true
: One FlowFile is created for each route specified, and tuples are appended to the correct FlowFile. Aggregation FlowFiles are transferred when all lines has been processed. One per route.
Routing¶
This processor allows you to dynamically create as many output relationships as you like.
Setting a particular tuple output route is done in the punchlet. Just use the Nifi Operator like :
nifi().addOutputRoute("red").into(root);
If Group by route
is set to true, the tuple will be added to the FlowFile attached to this route.
If set to false, a new FlowFile will be created and sent to this route.
Examples¶
Let's take a simple example. As input, we have a multi-line FlowFile :
user=punch value=one
user=punch value=two
user=nifi value=three
value=four
The punchlet would look like :
{
kv().on([content]).into(root);
remove([content]);
if(![user].isEmpty()){
nifi().addOutputRoute([user]).into(root);
}
}
The properties to set would be :
Punchlets input mode : Inline Code
Punchlets : <Code above>
Possible routes : punch,nifi
The output tuples would look like :
{"user":"punch","value":"one"}
We would have 2 FlowFiles in "punch" route, 1 FlowFile in "nifi" route and 1 FlowFile in "default" route, with a single json tuple each.
Group outputs :
Let's say we prefer to have a single FlowFile in "punch" route. We just need to set Group by route : true
.
We would now have a 2 lines FlowFile in "punch" route, with two json tuples in it.
Work with attributes :
Let's say we want to use the filename. We just need to set : With attributes : true
.
Filename can now be read in the punchlet using [attributes][filename]
.
Versioning¶
All PunchProcessors are not compatible with all Nifi versions :
- PunchProcessors of version 2.x only works with Nifi 1.11 or above.
- PunchProcessors of version 1.x works on older Nifi versions but doesn't provide auto-loading (Nifi's restart is required) or cross-versions of processors (Only one version of the processor per Nifi instance).