Skip to content

Punchlang

The punch programming language punchlang is an extension of Java, designed to easily manipulate and transform structured documents. It is handy and extremely compact to format, filter or enrich json documents.

Using punchlang, you write functions called punchlets that you execute in Java streaming applications. The PunchPlatform provides several ready to use punchlet runtime engines. By simple configuration your can chain your punchlets in (distributed) data pipelines, acting on the traversed stream of data. That particular usage has been and is still used to run [log parsers], in charge of parsing and normalizing logs in cybersecurity or monitoring applications.

Note

The term "punchlets" comes by analogy with "servlets". A servlet is a function you deploy in a traditional HTTP servlet container. It makes it easy to write a web server. Punchlets are similarly deployed in a punchlet container, making it easy to write analytics stream or batch pipelines.

Punchlets can actually be used for a number of other use cases. They can be executed as part of any Java applications. The PunchPlatform provides you with ready to use Spark and Storm based punchlet containers.

If you are familiar with Logstash configuration files, Punch will be very natural to you, it is about the same idea, but more compact and more expressive. Well, in fact, it is much more expressive. Here is a punchlet that parses and transforms Arkoon logs.

{
    if (!grok("%{SYSLOGTIMESTAMP:[event][timestamp} %{SYSLOGHOST:[observer][ip]} %{SYSLOGPROG}: AKLOG - %{GREEDYDATA:[aklog]}").on([logs][log])) {
        throw new PunchRuntimeException("invalid input log");
    }
    kv().on([aklog]).into([kv]);
    [obs][type] = [kv][fw];
    if ([kv][aktype] == "IP") {
        [event][type] = [kv][ip_log_type];
    }
}

The detailed Punch documentation is available in javadoc based documentation. In the rest of this documentation we explain how to quickly write run and understand punchlets.

This chapter assumes you went through the Punchlets Getting started. Your standalone platform provides you with a punchlet runner tool that we are going to use next.

Punchlang is designed to handle JSON documents with a compact and lean syntax. For example, the Punch expression [user][name] = "alice" produces

{
  "user": {
    "name": "alice"
  }
}

Let us start from this small example.

Create your first punchlet

Open a terminal and enter:

punchplatform-puncher.sh -it -qjc "{print(root);}"

You should see the following output.

[INFO] Interactive mode, waiting for input

We are in interactive mode, your terminal is waiting for some input. Now, copy and paste this input example and hit carriage return:

{"user": {"name": "alice"}}

You should get the exact same json with a pretty format

{
  "user": {
    "name": "alice"
  }
}

Good, you just run your first punchlet ! Let us explain a bit the punchplatform-puncher.sh options you just used:

  • The -it stands for interactive mode, -q for quiet mode.
  • -j make the puncher expect JSON input. (try without).
  • The code after the -c argument is the Punch code snippet, i.e. the punchlet itself. It is between braces because it is an function (see: punchlets_explained).
  • The function print() function is executed with a single argument root. root refers to the input received by the punchlet. It is called that way because it represents the root of a JSON document, possibly nested.
  • the Punch syntax is similar to Java, so each instruction must ends with colon '; ' ;
  • Under the covers, the Punch engine read your input JSON document and turns it into a structured tree. Each object in there is a JSON value, named Tuple. It could be a dictionary, an array, a set of other inner Tuples, or so-called leaf values (long, double, boolean, string).

Coming back to your example punchlet. The root Tuple is the top level document. It is a dictionary containing key value pairs.

The whole point of the Punch language is to make it extra easy to manipulate Tuples. You can access the inner values using brackets, for example to access your user name : root:[user][name]. And because root is a reserved language idiom, you can simply refer to is as : [user][name].

Let us now add an additional field to our user, (say) its age:

punchplatform-puncher.sh -it -qj -c '{[user][age] = 22; print(root);}'
[INFO] Interactive mode, waiting for input
{"user": {"name": "alice"}}
{
  "user": {
    "name": "alice",
    "age": 22
  }
}

Writing one-liners punchlets is tedious, let us now write cleaner small programs into .punch files. Create a mypunchlet.punch.

To make it easy to test, the @test annotation let us you define the default test input data of your punchlet.

// @test(encoding=json) {"user":{"name": "alice"}}
{
  [user][age] = 22 ;
}
Then simply type:
punchplatform-puncher.sh mypunchlet.punch
{
  "name": "alice",
  "age" : 22
}

That 's it, you have coded a punchlet program ready to be executed into a high performance data pipeline using the Punch Bolt or the Punch Stage.

Sublime Text is Punch's best friend

A Sublime Text plugin is provided to transform the Sublime Text editor in punchlet editor. It will dramatically ease your punchlet programming experience, without the need of a heavy SDK. To install it, execute the following command:

cd $PUNCHPLATFORM_CONF_DIR/resources/contrib/sublimeText3
./install.sh

Then, open mypunchlet.punch from Sublime Text (see the associated README.md for more details), hit Ctrl + B and watch the result.

Note

for VIM, add the lines : au BufRead,BufNewFile *.punch setfiletype java and [:nnoremap](C-B> :term punchplatform-puncher.sh -jp %<CR.md) in your .vimrc file.

Here is an example of running a punchlet in Sublime. The punchlet appears in the top part, the bottom parts shows the results of running it directly from the editor.

image

Java and operators

The rest of this chapter covers Punch programming concepts. Remember that Punch is just a Java wrapper, so you can use what you already know from the java language. Plus, there is plenty of methods that you can use on Tuples, for instance write this in a file:

{
  [user][category] = "child";
  if ([user][category].isEquals("child")) {
    print("Hello, kiddo!");
  } else {
    print("Good morning sir.");
  }
  if ([user][city].isEmpty()) {
    print("No city provided.");
  }
}
Execute it as follows:
punchplatform-puncher.sh -q -p mypunchlet.punch
hello, kiddo!
no city provided.

You can also use multiple operators to cut down strings, for instance:

  • kv: to parse key-value strings (e.g. field1=value1 field2=value2 ... fieldN=valueN)
  • csv: to parse CSV-like texts (e.g. value1;value2;...;valueN)
  • grok: use the power of Grok regex to extract arbitrary complex pattern
  • dissect: use to parse kv or csv efficiently

Here is a csv example

// @test(encoding=json) {"input":"firstfield;secondfield"}
{
  csv().on(root:[input]).into(root:[csv]);
}
punchplatform-puncher.sh -p mypunchlet.punch
{
  "input": "firstfield;secondfield",
  "csv": {
    "field1": "secondfield",
    "field0": "firstfield"
  }
}

Take a look at the Java documentation for all the available functions, in the Tuple section.

The @test punchlet annotation

In the previous examples some punchlets were prefixed with a comment line containing the @test annotation. This annotation is very helpful to set a default example input data associated to the punchlet. It is used by the punchplaform-puncher.sh tool, so that you can test your punchlet without the burden of having external injectors.

The @test annotation can contain additional options to better control the input format and fields expected by your punchlet.

Option Values Default
fields [a][b][c][...] Empty for JSON [logs][log] for text
encoding json -

For example to insert a raw payload into the default input field [logs][log], use no option:

// @test my row input text
{
  print(root);
}

This will inject a string into your punchlet. To insert the same payload under some inner fields, for example [logs][data], use:

// @test(fields=[logs][data]) my raw input log
{
  print(root);
}

To rather insert a JSON document as your root document:

// @test(encoding=json) {"logs": {"log": "my raw input log"}, "name": "bob"}
{
  print(root);
}

You can mix the two options as follows :

// @test(encoding=json, fields=[logs][data]) {"a": "foo", "b": "bar"}
{
  print(root);
}

Local Variable scope

A punchlet can contain Tuple locally declared in the scope of your function. When you return from your punchlet, their content is discarded. They are extremely useful to work on JSON content without the burden of altering the root Tuple.

{ 
    Tuple tmp;
    // fill it the way you want 
    tmp:[name] = "bob";

    // "bob" does not affect the root input data. 
    // It will be discarded at function return
}

A last important point to note. In some case you want to alter a Tuple content from the top. I.e. you want to completely overwrites its content with something else. You need a way to refer to the top value of the Tuple. Here is how :

{  
    // the ":/" notation refers to the top value. 
    root:/ = "new content";
}

You might ask why not just write ?

{  
    root = "new content";
}

Because the root variable is a reference to a map structure. If you assign it a new value (here a reference to a String) it will not alter the content of the root Tuple passed to your punchlet. You simply altered the local reference passed to you as an argument.

Reference versus Deep Copies

Tuples are designed for performance. In particular Tuple operations work by reference. Basically you work with pointers, altering the content of a value from a pointer will make the change affect all the other pointers. You may sometimes need to get deep copies to avoid altering the original Tuple. Check out the following snippet of code to make sure it is clear to you:

{
    [along] = 3;
    [astring] = "hello world!";
    [user][name] = "bob";

    // References a pointer to that object
    Tuple pointer = root;

    // The next instruction will alter the value of root:[along]
    pointer:[along] = 5;

    // Instead the next instructions  are safe and leave the root Tuple unaltered
    Tuple deepCopy = root.duplicate();
    deepCopy:[along] = 5;
}

Where to go next ?

If you are interested to write a new log parser or something similar, we suggest you move to the standard log parsers chapter.

If you want more details on the Punch language capabilities, go to the punchlets explained and working with tuples.