Punchlet As a Function¶
Abstract
A punchlet is a function transforming some input data. It is meant to be deployed in a punchlet runtime engine typically part of a data processing pipeline.
Punchlets Explained¶
Consider a simple example: rhe following punchlet adds one field to whatever input data is given:
{ [user][age] = 22; }
This syntax is short and easy. Here is how it actually works. A punchlet is a Java class instance. Its actual signature is defined as follows:
/*
* Your punchlet is an object implementing some interface.
* A punchlet is created and injected with some fields, in particular
* resource JSON files, groks patterns, a so-called world Tuple to give your
* Punch let access to the outside world, etc .. .
*
* A punchlet must implement a single 'execute' function, the one that will be executed
* with the input data.
*/
class Punchlet {
/*
* @param root : the root Tuple, containing the data (logs, events, whatever)
*/
void execute(Tuple root) {
// your punchlet code
root:[user][age] = 22;
}
}
It is only to make it simpler that the Punch short notation is provided. When you write
{ [user][age] = 22; }
The Punch compiler will take care of filling the rest, and at runtime additional resources will be injected to your punchlet.
When you deploy that punchlet in a stream of data; it will be applied to each traversing data item.
Packaging Punchlets¶
You can deliver punchlet files in several ways.
The simplest:
{ [user][age] = 22; }
// @test(fields=[user][name]) bob
{ [user][age] = 22; }
A better way is to ship your punchlet as part of a yaml file. You can then
add some description, more sophisticated input data as well as resources
, an important
punchlet feature described later on. Here is an example:
description: >
Here is a simple example of the ipmatch operator.
You want to know if an IP is in a list of ranges.
tests:
- logs:
log: 172.16.0.2
- logs:
log: 5.36.18.2
resources:
ranges:
- 10.0.0.0/8
- 172.16.0.0/12
- 192.168.0.0/16
- 127.0.0.1/32
punchlet: >
{
// Retrieve our IP domains from a punch resource.
// Resources comes from outside the punchlet. In this
// example it is defined above but in production
// it typically come from a S3 store, a remote filesystem
// etc..
Tuple ranges = getResourceTuple("ranges");
// You can use that resource tuple like any other.
[check] = ipmatch(ranges).contains([logs][log]);
}
If you use the punch language to write log parsers, the punch provides an official maven based packaging that allows you to add unit tests and sample log test files in a well structured package. Refer to the online parser repositorie.