Parser Development¶

Abstract

This guide explains how to develop, package and deploy log parsers to a production platform.

Parser versus Punchlet¶

It is important to first understand what exactly is is a parser and how it differs from a punchlet.

Using punchlets you can write small to medium size and complexity arbitrary functions that you deploy in data processing pipelines.

Log management is particular in that you need many of these functions to parse, normalize, enrich the incoming logs. A punch parser refers to one or several of such punchlets dedicated to a given log equipment, technolgy and/or vendor.

Technically speaking, a parser is not different than a set of punchlets, possibly packed with grok patterns or additional configuration files. However parsers are designed with care, must conform to strict normalisation rules and models and most importantly must be designed to be reused.

Note

For these reasons in the rest of this guide we will refer to parsers. This said whatever is explained is also valid for any kind of punchlets.

Parser Development Toolkit¶

To develop, package, ship and share parsers safely, the punch provides a simple yet robust tooling.

Requirements¶

java: 1.8 for punch releases 6.3, java 11 for releases 7.x.
maven: v3.6.2 or higher.
the punchplatform-puncher.sh tools. It is shipped with the punch standalone. You need it to develop, run and test your parsers.

The punchplatform-puncher.sh tool can also be downloaded separately. Download the jar file from this download area. You should get a jar file named punchplatform-puncher-6.4.5-SNAPSHOT-jar-with-dependencies.jar and the companion startup script punchplatform-puncher.sh.

To execute it simply type in

punchplatform-puncher.sh

If it is properly installed it will display its online help.

Setup¶

The punch leverages maven to provide you with a test and packaging toolkit. Here is how it works. Let us assume you want to package a parser for your company (say) 'com.yourcompany'.

First choose the name of your artefact. Say "firewall-parsers". Use the punch maven archetype generator to create your maven worskpace automatically:

mvn archetype:generate \
    -DarchetypeGroupId=org.thales.punch \
    -DarchetypeArtifactId=parser \
    -DarchetypeVersion=1.0.0 \
    -DgroupId=com.mycompany \
    -DartifactId=firewall-parsers

That will generate a fresh startup project. Its layout is as follows:

├── assembly
│   └── assembly.xml
├── pom.xml
├── src
│   └── com
│       └── mycompany
│           └── sample
│               ├── MANIFEST.yml
│               ├── groks
│               │   └── pattern.grok
│               ├── enrich.punch
│               ├── parser.punch
│               ├── resources
│               │   └── color_codes.json
│               └── test
│                   ├── sample.txt
│                   ├── unit_chain.json
│                   └── unit_punchlet.json
└── tools
    └── test.sh

Where:

com/mycompany/parsers/samplecorresponds to the fully qualified name of your parsers com.mycompany.parsers.sample. That name prefix uniquely identifies your package and your punchlets inside.
parser.punch and enrich.punch are sample punchlets. Check it out it illustrates the basics. This is where you write the actual logic of your log parsing or more generally data transformation.
groks/pattern.grok is a sample grok pattern. The punch comes with many patterns directly loaded, but here is how you can add your own.
resources/color_code.json is a sample resource files. In this sample it is used to add a numerical color code from a color string value ('red' or 'green').
test/unit_chain.json and test/unit_punchlet.json are punch unit test files. That lets you define unit tests to ensure each punchlet or a sequence of punchlets behave exactly as you expect.
test/sample.txt is an example a sample log file. These can be used to test a large number of logs.

It provides a complete yet simple example of what you can do with punchlets.

Test and Package¶

From there simply execute :

cd firewall-parsers

mvn clean test

to only test the punchlets, or

mvn clean install

to test then package and install the resulting archive.Y our punchlet package in located inside 'target/firewall-parsers-1.0-SNAPSHOT.zip', as well as in your local laptop maven repository (located under '\~/.m2/repository').

Tip

Al these are standard maven concepts and goodies. The punch simply leverage maven to produce velan and versionned packages. What is explained here for parser is similar for other punch (java or python) development.

Testing¶

As part of punch parser projects, you can write two types of tests that will automatically be executed.

Unit Tests¶

Unit tests are designed to test precisely one or a chain of punchlet(s). A unit test is designed using a simple json file expressing the input log and the expectated parsed fields after that log has been processed by the punchlet.

The punchplatform-puncher.sh tools will automatically play all unit test files you include in your test project folder. For details refer to the punchplatform-puncher.sh manual page and its online help.

Sample Tests¶

Sample tests use sample log files to simply check that all of them are parsed without errors. Sample tests are useful when working with customer log extracts. The punchplatform-puncher.sh tools will automatically play all sample files you include in your test project folder.

For details refer to the punchplatform-puncher.sh manual page and its online help.

Performance Test¶

The punchplatform-puncher.sh also provides options to evaluate your punchlet performance. Refer to how to write a robust parser. You will go through a tutorial to test the performance of our standard Apache parser.

You will then obtain a representative EPS number on a single thread (on some reference laptop architecture). That number is very useful to later on tune your punchlines to achieve your target throughput.

Deploy¶

Filesystem Deployment¶

The first and simplest method is to transfer your punchlets, patterns and resource files to you local per tenant configuration tree. This topic is explained in detail in the next chapter.

As a preview what you must do is to copy your files to a special punchplatform configuration folder:

└── conf
    └── tenants
        └─-─ sampletenant
            └── channels
            │   └── samplechannel
            │       ├── channel_structure.yml
            │       └── samplepunchline.yml
            └── resources
                └── punch
                    └── com
                       └── mycompany
                           └── sample
                               ├── MANIFEST.yml
                               ├── groks
                               │   └── pattern.grok
                               ├── enrich.punch
                               ├── parser.punch
                               └── resources
                                   └── color_codes.json

From there you can use your punchlets inside your channels.

Note that using this method, maven is not used anymore. You loose the version information. If you use that method ensure you use git to protect your configuration folder and keep track of your change history.

Important

Always maintain the same tree structure (here 'com/mycompany/sample') in your layout. That tree structure is a universal and recommended pattern used in maven and java. If you conform to that, your punchlines will run fine on other deployed punch platforms.

Package Based Installation¶

Warning

This method is supported only from release 6.3.4 and higher.

You can alternatively ship the maven generated archive directly to the repository folder located in your configuration folder. Here it is illustrated:

└── conf
     ├─── repository
     │       └─── firewall-parsers-1.0.0.zip
     └── tenants
        └─-─ sampletenant
            └─── channels
                └── samplechannel
                    ├── channel_structure.yml
                    └── samplepunchline.yml

From there you can simply refer to your parser various resources in punchlines as follows:

version: "6.3.4"
name: dhcp-parser
runtime: storm
resources:
- punch-parser:com.mycompany:firewall-parsers:1.0.0
dag:
- type: syslog_input
  settings:
    listen:
      proto: tcp
      host: 0.0.0.0
      port: 9902
  publish:
  - stream: logs
    fields:
    - log
- type: punchlet_node
  settings:
    json_resources:
    - com/mycompany/sample/resources/color_codes.json
    punchlets:
    - com/mycompany/sample/parser.punch
  subscribe:
  - component: syslog_input
    stream: logs

This method is robust and encouraged on production platforms.

Important

The two methods have pros and cons. The package-based solution is best suited for production platforms with strong traceability and version control requirements. The filesystem solution is flexible but demand you protect your configuration folder using git on your own.