Skip to content

Punch Programming

Punch is an extension of Java, designed to easily manipulate and transform structured documents. It is handy and extremely compact to format, filter or enrich json documents.

Using punch, you write functions called punchlets that you execute in Java streaming applications. The PunchPlatform provides several ready to use punchlet runtime engines. By simple configuration your can chain your punchlets in (distributed) data pipelines, acting on the traversed stream of data. That particular usage has been and is still used to run [log parsers], in charge of parsing and normalizing logs in cybersecurity or monitoring applications.

Note

The term "punchlets" comes by analogy with "servlets". A servlet is a function you deploy in a traditional HTTP servlet container. It makes it easy to write a web server. Punchlets are similarly deployed in a punchlet container, making it easy to write analytics stream or batch pipelines.

Punchlets can actually be used for a number of other use cases. They can be executed as part of any Java applications. The PunchPlatform provides you with ready to use Spark and Storm based punchlet containers.

If you are familiar with Logstash configuration files, Punch will be very natural to you, it is about the same idea, but more compact and more expressive. Well, in fact, it is much more expressive. Here is a punchlet that parses and transforms Arkoon logs.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
{
    if (!grok("%{SYSLOGTIMESTAMP:[event][timestamp} %{SYSLOGHOST:[observer][ip]} %{SYSLOGPROG}: AKLOG - %{GREEDYDATA:[aklog]}").on([logs][log])) {
        throw new PunchRuntimeException("invalid input log");
    }
    kv().on([aklog]).into([kv]);
    [obs][type] = [kv][fw];
    if ([kv][aktype] == "IP") {
        [event][type] = [kv][ip_log_type];
    }
}

The detailed Punch documentation is available in javadoc based documentation. In the rest of this documentation we explain how to quickly write run and understand punchlets.

This chapter assumes you went through the Punchlets Getting started. Your standalone platform provides you with a punchlet runner tool that we are going to use next.

The Punch language is designed to handle JSON documents with a compact and lean syntax. For example, the Punch expression [user][name] = "alice" produces

1
2
3
4
5
{
  "user": {
    "name": "alice"
  }
}

Let us start from this small example.

Create your first punchlet

Open a terminal and enter:

1
$ punchplatform-puncher.sh -it -qjc "{print(root);}"

You should see the following ouput.

1
[INFO] Interactive mode, waiting for input

We are in interactive mode, your terminal is waiting for some input. Now, copy and paste this input example and hit carriage return:

1
{"user": {"name": "alice"}}

You should get the exact same json with a pretty format

1
2
3
4
5
{
  "user": {
    "name": "alice"
  }
}

Good, you just run your fist punchlet ! Let us explain a bit the punchplatform-puncher.sh options you just used:

  • The -it stands for interactive mode, -q for quiet mode.
  • -j make the puncher expect JSON input. (try without).
  • The code after the -c argument is the Punch code snippet, i.e. the punchlet itself. It is between braces because it is an function (see: punchlets_explained).
  • The function print() function is executed with a single argument root. root refers to the input received by the punchlet. It is called that way because it represents the root of a JSON document, possibly nested.
  • the Punch syntax is similar to Java, so each instruction must ends with colon '; ' ;
  • Under the covers, the Punch engine read your input JSON document and turns it into a structured arborescence. Each object in there is a JSON value, named Tuple. It could be a dictionary, an array, a set of other inner Tuples, or so-called leaf values (long, double, boolean, string).

Coming back to your example punchlet. The root Tuple is the top level document. It is a dictionary containing a single key value pair. The key is *}). That Tuple is itself a dictionary. You get the idea.

The whole point of the Punch language is to make it extra easy to manipulate Tuples. You can access the inner values using brackets, for example to access your user name : root:[user][name]. And because [root] is a reserved language idiom, you can simply refer to is as : [user][name].

Let us now add an additional field to our user, (say) its age:

1
2
3
4
5
6
7
8
9
$ punchplatform-puncher.sh -it -qj -c '{[user][age] = 22; print(root);}'
[INFO] Interactive mode, waiting for input
{"user": {"name": "alice"}}
{
  "user": {
    "name": "alice",
    "age": 22
  }
}

Writing one-liners punchlets is tedious, let us now write cleaner small programs into .punch files. Create a mypunchlet.punch.

To make it easy to test, the @test annotation let us you define the default test input data of your punchlet.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
$ cat mypunchlet.punch
// @test(encoding=json) {"user":{"name": "alice"}}
{
  [user][age] = 22 ;
  print([user]);
}

$ punchplatform-puncher.sh -qp mypunchlet.punch
{
  "name": "alice",
  "age" : 22
}

That 's it, you have coded a punchlet program ready to be executed into a high performance data pipeline using the Punch Bolt or the Punch Stage.

Sublime Text is Punch's best friend

A Sublime Text plugin is provided to transform the Sublime Text editor in punchlet editor. It will dramatically ease your punchlet programming experience, without the need of a heavy SDK. To install it, execute the following command:

1
2
$ cd $PUNCHPLATFORM_CONF_DIR/resources/contrib/sublimeText3
$ ./install.sh

Then, open mypunchlet.punch from Sublime Text (see the associated README.md for more details), hit Ctrl + B and watch the result.

Note

for VIM, add the lines : au BufRead,BufNewFile *.punch setfiletype java and [:nnoremap](C-B> :term punchplatform-puncher.sh -jp %<CR.md) in your .vimrc file.

Here is an example of running a punchlet in Sublime. The punchlet appears in the top part, the bottom parts shows the results of running it directly from the editor.

image

Java and operators

The rest of this chapter covers Punch programming concepts. Remember that Punch is just a Java wrapper, so you can use what you already know from the java language. Plus, there is plenty of methods that you can use on Tuples, for instance:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
$ cat usercategory.punch
{
  [user][category] = "child";
  if ([user][category].isEquals("child")) {
    print("Hello, kiddo!");
  } else {
    print("Good morning sir.");
  }
  if ([user][city].isEmpty()) {
    print("No city provided.");
  }
}

$ punchplatform-puncher.sh -q -p usercategory.punch
hello, kiddo!
no city provided.

You can also use multiple operators to cut down strings, for instance:

  • kv: to parse key-value strings (e.g. field1=value1 field2=value2 ... fieldN=valueN)
  • csv: to parse CSV-like texts (e.g. value1;value2;...;valueN)
  • grok: use the power of Grok regex to extract arbitrary complex pattern
  • dissect: use to parse kv or csv efficiently

Operators guide:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
$ cat csvsample.punch
// @test(encoding=json) {"input":"firstfield;secondfield"}
{
  csv().on(root:[input]).into(root:[csv]);
}

$ punchplatform-puncher.sh -p csvsample.punch
{
  "input": "firstfield;secondfield",
  "csv": {
    "field1": "secondfield",
    "field0": "firstfield"
  }
}

Take a look at the Java documentation for all the available functions, in the Tuple section.

The @test punchlet annotation

In the previous examples some punchlets were prefixed with a comment line containing the @test annotation. This annotation is very helpful to set a default example input data associated to the punchlet. It is used by the punchplaform-puncher.sh tool, so that you can test your punchlet without the burden of having external injectors.

The @test annotation can contain additional options to better control the input format and fields expected by your punchlet.

Option  Values  Default
encoding fields  [a][b][c][...]  Empty for JSON [logs][log] for text

For example to insert a raw payload into the default input field [logs][log], use no option:

1
2
3
4
// @test my row input text
{
  print(root);
}

This will inject a string into your punchlet. To insert the same payload under some inner fields, for example [logs][data], use:

1
2
3
4
// @test(fields=[logs][data]) my raw input log
{
  print(root);
}

To rather insert a JSON document as your root document:

1
2
3
4
// @test(encoding=json) {"logs": {"log": "my raw input log"}, "name": "bob"}
{
  print(root);
}

You can mix the two options as follows :

1
2
3
4
// @test(encoding=json, fields=[logs][data]) {"a": "foo", "b": "bar"}
{
  print(root);
}

Where to go next ?

If you are interested to write a new log parser or something similar, we suggest you move to the standard log parsers chapter.

If you want more details on the Punch language capabilities, go to the punchlets explained and working with tuples.