Skip to content

Parser Development

Abstract

This guide explains how to develop, test and package log parsers.

Parser versus Punchlet

It is important to understand what exactly is a parser and how it differs from a punchlet.

Using punchlets you can write small to medium size and complexity arbitrary functions that you deploy in data processing pipelines.

Log management is particular in that you need many of these functions to parse, normalize, enrich the incoming logs. A punch parser refers to one or several of such punchlets dedicated to a given log equipment, technology and/or vendor.

Technically speaking, a parser is not different from a set of punchlets, possibly packed with grok patterns or additional configuration files. However, parsers are designed with care. They must conform to strict normalisation rules and models to be reused.

Note

For these reasons in the rest of this guide we will refer to parsers. This said whatever is explained is also valid for any kind of punchlets.

Developing Parser

Requirements

  • Java:
    • 1.8 for punch releases 6.3.X
    • 11 for releases 7.X.
  • Maven: v3.6.2 or higher.
  • punchplatform-puncher.sh. It is shipped with the Punch Standalone and the Punch Console.

Setup

The Punch Maven Archetype Generator is a Maven Archetype Plugin that will generate a base structure to develop your parser.

Simply run this command from your PConsole or Standalone to have the Maven Archetype Plugin locally :

punchplatform-development.sh --install

For example, to generate a parser project firewall-parsers for your company com.yourcompany :

mvn archetype:generate \
    -DarchetypeGroupId=org.thales.punch \
    -DarchetypeArtifactId=punch-parser-artefact \
    -DarchetypeVersion=1.0.0 \
    -DgroupId=com.mycompany \
    -DartifactId=firewall-parsers

Result :

├── assembly
│   └── assembly.xml
├── metadata
│   └── metadata.yml
├── pom.xml
├── src
│   └── com
│       └── mycompany
│           └── sample
│               ├── MANIFEST.yml
│               ├── README.md
│               ├── groks
│               │   └── pattern.grok
│               ├── enrich.punch
│               ├── parser.punch
│               ├── resources
│               │   └── color_codes.json
│               └── test
│                   ├── sample.txt
│                   ├── unit.json
└── tools
    └── test.sh

Where :

  • com/mycompany/sample matches the Fully Qualified Domain Name of your parsers. com.mycompany.sample prefix uniquely identifies your package and your punchlets. You can nest other folders to have a FQDN that suits your needs.
  • MANIFEST.yml explains the parsing chain with related resources for this parser. More information about this file in the next section.
  • README.md is a documentation file presenting the parser characteristics.
  • parser.punch and enrich.punch are sample punchlets. This is where log parsing operations are specified.
  • groks/pattern.grok is a sample grok pattern. The punch comes with many patterns directly loaded, but here is how your own can be added.
  • resources/color_code.json is a sample resource file. In this sample it is used to add a numerical color code from a color string value ('red' or 'green').
  • test/unit.json is a punch unit test file. Those unit tests are there to ensure each punchlet or that a sequence of punchlets behaves exactly as expected.
  • test/sample.txt is a sample log file. It is used to have an exhaustive list of all the log structures the parser should be able to process.

Your parser must respect this file structure.

MANIFEST.yml

The MANIFEST.yml file describes the parsing chain with related resources. In the previous example, the chain is the following :

  1. Punchlet parser.punch takes an input stream logs with a field data. It requires pattern.grok to work.
  2. Punchlet parser.punch takes an input stream logs with a field data. It requires color_codes.json to work.

Warning

There is no outputStream field for each punchlet. The output data structure should be the same as the input data structure. If your punchlet takes an input stream logs with a field data, it should publish an output stream logs with a field data

The parsing chain described in the MANIFEST.yml is also used for testing your punchlets.

Testing Parser

Run Tests

Testing parser is a Maven goal :

mvn clean test

On the background, the punchplatform-puncher.sh is called. It will read the MANIFEST.yml and apply the parsing chain described. For details refer to the punchplatform-puncher.sh manual page and its online help.

Unit Tests

Unit tests are designed to test precisely one or a chain of punchlet(s). A unit test is designed using a simple json file expressing the input log and the expected parsed fields after that log has been processed by the punchlet.

In the generated example, it will use input and output data in test/unit.json.

Sample Tests

Sample tests use sample log files to simply check that all of them are parsed without errors. Sample tests are useful when working with real log extracts. The punchplatform-puncher.sh tool will automatically play all sample files you include in your test project folder.

In the generated example, it will try to parse test/sample.txt.

Performance Tests

The punchplatform-puncher.sh also provides options to evaluate your punchlet performance. Refer to how to write a robust parser . You will go through a tutorial to test the performance of our standard Apache parser.

You will then obtain a representative EPS number on a single thread (on some reference laptop architecture). That number is very useful to later on tune your punchlines to achieve your target throughput.

Packaging Parser

Packaging is another Maven goal :

mvn clean install

Your packaged artefact will be available at 2 locations :

  1. In target/firewall-parsers-1.0-SNAPSHOT.zip.
  2. In ~/.m2/repository/com/mycompany/firewall-parsers/1.0-SNAPSHOT/firewall-parsers-1.0-SNAPSHOT.zip.

Check next page on Parser Deployment for how to use those parsers.