Parser Development¶
Abstract
This guide explains how to develop, package and deploy log parsers to a production platform.
Parser versus Punchlet¶
It is important to first understand what exactly is is a parser and how it differs from a punchlet.
Using punchlets you can write small to medium size and complexity arbitrary functions that you deploy in data processing pipelines.
Log management is particular in that you need many of these functions to parse, normalize, enrich the incoming logs. A punch parser refers to one or several of such punchlets dedicated to a given log equipment, technolgy and/or vendor.
Technically speaking, a parser is not different than a set of punchlets, possibly packed with grok patterns or additional configuration files. However parsers are designed with care, must conform to strict normalisation rules and models and most importantly must be designed to be reused.
Note
For these reasons in the rest of this guide we will refer to parsers. This said whatever is explained is also valid for any kind of punchlets.
Parser Development Toolkit¶
To develop, package, ship and share parsers safely, the punch provides a simple yet robust tooling.
Requirements¶
- java: 1.8 for punch releases 6.3, java 11 for releases 7.x.
- maven: v3.6.2 or higher.
- the punchplatform-puncher.sh tools. It is shipped with the punch standalone. You need it to develop, run and test your parsers.
The punchplatform-puncher.sh tool can also be downloaded separately.
Download the jar file from this download area.
You should get a jar file named punchplatform-puncher-6.4.5-jar-with-dependencies.jar
and the companion startup
script punchplatform-puncher.sh.
To execute it simply type in
punchplatform-puncher.sh
If it is properly installed it will display its online help.
Setup¶
The punch leverages maven to provide you with a test and packaging toolkit. Here is how it works. Let us assume you want to package a parser for your company (say) 'com.yourcompany'.
First choose the name of your artefact. Say "firewall-parsers". Use the punch maven archetype generator to create your maven worskpace automatically:
mvn archetype:generate \
-DarchetypeGroupId=org.thales.punch \
-DarchetypeArtifactId=parser \
-DarchetypeVersion=1.0.0 \
-DgroupId=com.mycompany \
-DartifactId=firewall-parsers
That will generate a fresh startup project. Its layout is as follows:
├── assembly
│ └── assembly.xml
├── pom.xml
├── src
│ └── com
│ └── mycompany
│ └── sample
│ ├── MANIFEST.yml
│ ├── groks
│ │ └── pattern.grok
│ ├── enrich.punch
│ ├── parser.punch
│ ├── resources
│ │ └── color_codes.json
│ └── test
│ ├── sample.txt
│ ├── unit_chain.json
│ └── unit_punchlet.json
└── tools
└── test.sh
Where:
com/mycompany/parsers/sample
corresponds to the fully qualified name of your parserscom.mycompany.parsers.sample
. That name prefix uniquely identifies your package and your punchlets inside.parser.punch
andenrich.punch
are sample punchlets. Check it out it illustrates the basics. This is where you write the actual logic of your log parsing or more generally data transformation.groks/pattern.grok
is a sample grok pattern. The punch comes with many patterns directly loaded, but here is how you can add your own.resources/color_code.json
is a sample resource files. In this sample it is used to add a numerical color code from a color string value ('red' or 'green').test/unit_chain.json
andtest/unit_punchlet.json
are punch unit test files. That lets you define unit tests to ensure each punchlet or a sequence of punchlets behave exactly as you expect.test/sample.txt
is an example a sample log file. These can be used to test a large number of logs.
It provides a complete yet simple example of what you can do with punchlets.
Test and Package¶
From there simply execute :
cd firewall-parsers
mvn clean test
mvn clean install
Tip
Al these are standard maven concepts and goodies. The punch simply leverage maven to produce velan and versionned packages. What is explained here for parser is similar for other punch (java or python) development.
Testing¶
As part of punch parser projects, you can write two types of tests that will automatically be executed.
Unit Tests¶
Unit tests are designed to test precisely one or a chain of punchlet(s). A unit test is designed using a simple json file expressing the input log and the expectated parsed fields after that log has been processed by the punchlet.
The punchplatform-puncher.sh tools will automatically play all unit test files you include in your test project folder. For details refer to the punchplatform-puncher.sh manual page and its online help.
Sample Tests¶
Sample tests use sample log files to simply check that all of them are parsed without errors. Sample tests are useful when working with customer log extracts. The punchplatform-puncher.sh tools will automatically play all sample files you include in your test project folder.
For details refer to the punchplatform-puncher.sh manual page and its online help.
Performance Test¶
The punchplatform-puncher.sh also provides options to evaluate your punchlet performance. Refer to how to write a robust parser. You will go through a tutorial to test the performance of our standard Apache parser.
You will then obtain a representative EPS number on a single thread (on some reference laptop architecture). That number is very useful to later on tune your punchlines to achieve your target throughput.
Deploy¶
Filesystem Deployment¶
The first and simplest method is to transfer your punchlets, patterns and resource files to you local per tenant configuration tree. This topic is explained in detail in the next chapter.
As a preview what you must do is to copy your files to a special punchplatform configuration folder:
└── conf
└── tenants
└─-─ sampletenant
└── channels
│ └── samplechannel
│ ├── channel_structure.yml
│ └── samplepunchline.yml
└── resources
└── punch
└── com
└── mycompany
└── sample
├── MANIFEST.yml
├── groks
│ └── pattern.grok
├── enrich.punch
├── parser.punch
└── resources
└── color_codes.json
From there you can use your punchlets inside your channels.
Note that using this method, maven is not used anymore. You loose the version information. If you use that method ensure you use git to protect your configuration folder and keep track of your change history.
Important
Always maintain the same tree structure (here 'com/mycompany/sample') in your layout. That tree structure is a universal and recommended pattern used in maven and java. If you conform to that, your punchlines will run fine on other deployed punch platforms.
Package Based Installation¶
Warning
This method is supported only from release 6.3.4 and higher.
You can alternatively ship the maven generated archive directly to the repository
folder located in your configuration folder. Here it is illustrated:
└── conf
├─── repository
│ └─── firewall-parsers-1.0.0.zip
└── tenants
└─-─ sampletenant
└─── channels
└── samplechannel
├── channel_structure.yml
└── samplepunchline.yml
From there you can simply refer to your parser various resources in punchlines as follows:
version: "6.3.4"
name: dhcp-parser
runtime: storm
resources:
- punch-parser:com.mycompany:firewall-parsers:1.0.0
dag:
- type: syslog_input
settings:
listen:
proto: tcp
host: 0.0.0.0
port: 9902
publish:
- stream: logs
fields:
- log
- type: punchlet_node
settings:
json_resources:
- com/mycompany/sample/resources/color_codes.json
punchlets:
- com/mycompany/sample/parser.punch
subscribe:
- component: syslog_input
stream: logs
This method is robust and encouraged on production platforms.
Important
The two methods have pros and cons. The package-based solution is best suited for production platforms with strong traceability and version control requirements. The filesystem solution is flexible but demand you protect your configuration folder using git on your own.