Skip to content

Parser Development


This guide explains how to develop, package and deploy log parsers to a production platform.

Parser versus Punchlet

It is important to first understand what exactly is is a parser and how it differs from a punchlet.

Using punchlets you can write small to medium size and complexity arbitrary functions that you deploy in data processing pipelines.

Log management is particular in that you need many of these functions to parse, normalize, enrich the incoming logs. A punch parser refers to one or several of such punchlets dedicated to a given log equipment, technolgy and/or vendor.

Technically speaking, a parser is not different than a set of punchlets, possibly packed with grok patterns or additional configuration files. However parsers are designed with care, must conform to strict normalisation rules and models and most importantly must be designed to be reused.


For these reasons in the rest of this guide we will refer to parsers. This said whatever is explained is also valid for any kind of punchlets.

Parser Development Toolkit

To develop, package, ship and share parsers safely, the punch provides a simple yet robust tooling.


  • java: 1.8 for punch releases 6.3, java 11 for releases 7.x.
  • maven: v3.6.2 or higher.
  • the tools. It is shipped with the punch standalone. You need it to develop, run and test your parsers.

The tool can also be downloaded separately. Download the jar file from this download area. You should get a jar file named punchplatform-puncher-6.3.4-jar-with-dependencies.jar and the companion startup script

To execute it simply type in

If it is properly installed it will display its online help.


The punch leverages maven to provide you with a test and packaging toolkit. Here is how it works. Let us assume you want to package a parser for your company (say) 'com.yourcompany'.

First choose the name of your artefact. Say "firewall-parsers". Use the punch maven archetype generator to create your maven worskpace automatically:

mvn archetype:generate \
    -DarchetypeGroupId=org.thales.punch \
    -DarchetypeArtifactId=parser \
    -DarchetypeVersion=1.0.0 \
    -DgroupId=com.mycompany \

That will generate a fresh startup project. Its layout is as follows:

├── assembly
│   └── assembly.xml
├── pom.xml
├── src
│   └── com
│       └── mycompany
│           └── sample
│               ├── MANIFEST.yml
│               ├── groks
│               │   └── pattern.grok
│               ├── enrich.punch
│               ├── parser.punch
│               ├── resources
│               │   └── color_codes.json
│               └── test
│                   ├── sample.txt
│                   ├── unit_chain.json
│                   └── unit_punchlet.json
└── tools


  • com/mycompany/parsers/samplecorresponds to the fully qualified name of your parsers com.mycompany.parsers.sample. That name prefix uniquely identifies your package and your punchlets inside.
  • parser.punch and enrich.punch are sample punchlets. Check it out it illustrates the basics. This is where you write the actual logic of your log parsing or more generally data transformation.
  • groks/pattern.grok is a sample grok pattern. The punch comes with many patterns directly loaded, but here is how you can add your own.
  • resources/color_code.json is a sample resource files. In this sample it is used to add a numerical color code from a color string value ('red' or 'green').
  • test/unit_chain.json and test/unit_punchlet.json are punch unit test files. That lets you define unit tests to ensure each punchlet or a sequence of punchlets behave exactly as you expect.
  • test/sample.txt is an example a sample log file. These can be used to test a large number of logs.

It provides a complete yet simple example of what you can do with punchlets.

Test and Package

From there simply execute :

cd firewall-parsers

mvn clean test
to only test the punchlets, or
mvn clean install
to test then package and install the resulting archive.Y our punchlet package in located inside 'target/', as well as in your local laptop maven repository (located under '\~/.m2/repository').


Al these are standard maven concepts and goodies. The punch simply leverage maven to produce velan and versionned packages. What is explained here for parser is similar for other punch (java or python) development.


As part of punch parser projects, you can write two types of tests that will automatically be executed.

Unit Tests

Unit tests are designed to test precisely one or a chain of punchlet(s). A unit test is designed using a simple json file expressing the input log and the expectated parsed fields after that log has been processed by the punchlet.

The tools will automatically play all unit test files you include in your test project folder. For details refer to the manual page and its online help.

Sample Tests

Sample tests use sample log files to simply check that all of them are parsed without errors. Sample tests are useful when working with customer log extracts. The tools will automatically play all sample files you include in your test project folder.

For details refer to the manual page and its online help.

Performance Test

The also provides options to evaluate your punchlet performance. Refer to how to write a robust parser. You will go through a tutorial to test the performance of our standard Apache parser.

You will then obtain a representative EPS number on a single thread (on some reference laptop architecture). That number is very useful to later on tune your punchlines to achieve your target throughput.


Filesystem Deployment

The first and simplest method is to transfer your punchlets, patterns and resource files to you local per tenant configuration tree. This topic is explained in detail in the next chapter.

As a preview what you must do is to copy your files to a special punchplatform configuration folder:

└── conf
    └── tenants
        └─-─ sampletenant
            └── channels
            │   └── samplechannel
            │       ├── channel_structure.yml
            │       └── samplepunchline.yml
            └── resources
                └── punch
                    └── com
                       └── mycompany
                           └── sample
                               ├── MANIFEST.yml
                               ├── groks
                               │   └── pattern.grok
                               ├── enrich.punch
                               ├── parser.punch
                               └── resources
                                   └── color_codes.json

From there you can use your punchlets inside your channels.

Note that using this method, maven is not used anymore. You loose the version information. If you use that method ensure you use git to protect your configuration folder and keep track of your change history.


Always maintain the same tree structure (here 'com/mycompany/sample') in your layout. That tree structure is a universal and recommended pattern used in maven and java. If you conform to that, your punchlines will run fine on other deployed punch platforms.

Package Based Installation


This method is supported only from release 6.3.4 and higher.

You can alternatively ship the maven generated archive directly to the repository folder located in your configuration folder. Here it is illustrated:

└── conf
     ├─── repository
     │       └───
     └── tenants
        └─-─ sampletenant
            └─── channels
                └── samplechannel
                    ├── channel_structure.yml
                    └── samplepunchline.yml

From there you can simply refer to your parser various resources in punchlines as follows:

version: "6.3.4"
name: dhcp-parser
runtime: storm
- punch-parser:com.mycompany:firewall-parsers:1.0.0
- type: syslog_input
      proto: tcp
      port: 9902
  - stream: logs
    - log
- type: punchlet_node
    - com/mycompany/sample/resources/color_codes.json
    - com/mycompany/sample/parser.punch
  - component: syslog_input
    stream: logs

This method is robust and encouraged on production platforms.


The two methods have pros and cons. The package-based solution is best suited for production platforms with strong traceability and version control requirements. The filesystem solution is flexible but demand you protect your configuration folder using git on your own.