Skip to content

Nifi Punch Processor

Abstract

This Nifi Processor allows to run punchlets on FlowFiles in Apache Nifi. See this blog for a brief introduction.

Warning

Check compatibility with your Nifi's version in the versioning section.

Before you start

First, you need to install Nifi. Download the latest release and simply extract the archive to the location that you wish to run the application from.

In the following instructions, Nifi will be assumed to be installed in your home directory.

Install PunchProcessor

You can install install the PunchProcessor the same way you would install any other custom processor in Nifi.

Download the latest release of the processor from Punch Download Page and copy the nar provided to Nifi's lib folder before starting it, or add it to the extension folder of a running Nifi to use auto-loading.

PunchProcessor should be available in your processors list with tags thales, punch and punchlet.

Properties

Resources can be provided in 3 different ways :

  • Inline code : the code resource is directly provided.
  • Paths (files/dirs) : the code resource is extracted from files/dirs provided.
  • Http urls : the code resource is provided from url response body.

As result, each type of resource has 2 properties : the input mode and the value.

Name Type Status Default value Description
Punchlets input mode Options Required Paths (files/dirs) Punchlets input mode.
Punchlets String Required - Punchlets value.
Grok patterns input mode Options Optional Paths (files/dirs) Grok patterns input mode.
Grok patterns String Optional - Grok patterns value.
Siddhi rules input mode Options Optional Paths (files/dirs) Siddhi rules input mode.
Siddhi rules String Optional - Siddhi rules value.
Json resources input mode Options Optional Paths (files/dirs) Json resources input mode.
Json resources String Optional - Json resources value.
With Attributes Boolean Optional false Make FlowFile attributes available to the Punchlet.
Possible routes String Optional - Comma separated route to which FlowFiles can be sent.
Group by route Boolean Optional false Group output tuples into a single FlowFile per route.

Example :

With resources located in /home/user/nifi/standalone/, a configuration could look like :

Punchlets input mode : Inline Code
Punchlets : { print([content]); }

Grok patterns input mode : Paths (file/dirs) 
Grok pattern : /home/user/nifi/standalone/resources/patterns

Siddhi rules : Paths (file/dirs)
Siddhi rules : /home/user/nifi/standalone/resources/standard/common/detection.rule

Json resources input mode : Paths (file/dirs)
Json ressources : /home/user/nifi/standalone/resources/standard/apache_httpd/http_codes.json,
                  /home/user/nifi/standalone/resources/standard/apache_httpd/taxonomy.json

With Attributes : false
Possible Routes : red,blue,green
Group by route : false

Validation

The punchlet is compiled at processor validation (before processor execution). If a compilation exception is raised, the processor validation is a failure and it can't be started.

Execution

Here's how the processor works :

  • The input FlowFile content is parsed to extract each line.
  • Each line is converted into a Tuple with two main keys :
    • content : FlowFile content. If Json formatted, parsed and stored as Tuple. If raw string, stored as string.
    • attributes : Optional. FlowFile attributes. Stored as a key-value map.
  • Each Tuple is processed using the punchlet.
  • Result tuple is converted to Json string.
  • If "Group by route" is :
    • false : A new FlowFile is created for each processed line, and sent to specified route, or "default" if none is provided.
    • true : One FlowFile is created for each route specified, and tuples are appended to the correct FlowFile. Aggregation FlowFiles are transferred when all lines has been processed. One per route.

Routing

This processor allows you to dynamically create as many output relationships as you like.

Setting a particular tuple output route is done in the punchlet. Just use the Nifi Operator like :

nifi().addOutputRoute("red").into(root); 

If Group by route is set to true, the tuple will be added to the FlowFile attached to this route. If set to false, a new FlowFile will be created and sent to this route.

Examples

Let's take a simple example. As input, we have a multi-line FlowFile :

user=punch value=one
user=punch value=two
user=nifi value=three
value=four
We want to apply a key-value operator to extract the values as Json and route outputs according to user.

The punchlet would look like :

{
    kv().on([content]).into(root);
    remove([content]);
    if(![user].isEmpty()){
        nifi().addOutputRoute([user]).into(root);
    }
}

The properties to set would be :

Punchlets input mode : Inline Code
Punchlets            : <Code above>
Possible routes      : punch,nifi

The output tuples would look like :

{"user":"punch","value":"one"}
with right values depending on the tuple.

We would have 2 FlowFiles in "punch" route, 1 FlowFile in "nifi" route and 1 FlowFile in "default" route, with a single json tuple each.

Group outputs : Let's say we prefer to have a single FlowFile in "punch" route. We just need to set Group by route : true. We would now have a 2 lines FlowFile in "punch" route, with two json tuples in it.

Work with attributes : Let's say we want to use the filename. We just need to set : With attributes : true. Filename can now be read in the punchlet using [attributes][filename].

Versioning

All PunchProcessors are not compatible with all Nifi versions :

  • PunchProcessors of version 2.x only works with Nifi 1.11 or above.
  • PunchProcessors of version 1.x works on older Nifi versions but doesn't provide auto-loading (Nifi's restart is required) or cross-versions of processors (Only one version of the processor per Nifi instance).