This guide explains how to develop, package and deploy log parsers to a production platform.
Parser versus Punchlet¶
It is important to first understand what exactly is is a parser and how it differs from a punchlet.
Using punchlets you can write small to medium size and complexity arbitrary functions that you deploy in data processing pipelines.
Log management is particular in that you need many of these functions to parse, normalize, enrich the incoming logs. A punch parser refers to one or several of such punchlets dedicated to a given log equipment, technolgy and/or vendor.
Technically speaking, a parser is not different than a set of punchlets, possibly packed with grok patterns or additional configuration files. However parsers are designed with care, must conform to strict normalisation rules and models and most importantly must be designed to be reused.
For these reasons in the rest of this guide we will refer to parsers. This said whatever is explained is also valid for any kind of punchlets.
Parser Development Toolkit¶
To develop, package, ship and share parsers safely, the punch provides a simple yet robust tooling.
- java: 1.8 for punch releases 6.3, java 11 for releases 7.x.
- maven: v3.6.2 or higher.
- the punchplatform-puncher.sh tools. It is shipped with the punch standalone. You need it to develop, run and test your parsers.
The punchplatform-puncher.sh tool can also be downloaded separately.
Download the jar file from this download area.
You should get a jar file named
punchplatform-puncher-6.4.4-SNAPSHOT-jar-with-dependencies.jar and the companion startup
To execute it simply type in
If it is properly installed it will display its online help.
The punch leverages maven to provide you with a test and packaging toolkit. Here is how it works. Let us assume you want to package a parser for your company (say) 'com.yourcompany'.
First choose the name of your artefact. Say "firewall-parsers". Use the punch maven archetype generator to create your maven worskpace automatically:
mvn archetype:generate \ -DarchetypeGroupId=org.thales.punch \ -DarchetypeArtifactId=parser \ -DarchetypeVersion=1.0.0 \ -DgroupId=com.mycompany \ -DartifactId=firewall-parsers
That will generate a fresh startup project. Its layout is as follows:
├── assembly │ └── assembly.xml ├── pom.xml ├── src │ └── com │ └── mycompany │ └── sample │ ├── MANIFEST.yml │ ├── groks │ │ └── pattern.grok │ ├── enrich.punch │ ├── parser.punch │ ├── resources │ │ └── color_codes.json │ └── test │ ├── sample.txt │ ├── unit_chain.json │ └── unit_punchlet.json └── tools └── test.sh
com/mycompany/parsers/samplecorresponds to the fully qualified name of your parsers
com.mycompany.parsers.sample. That name prefix uniquely identifies your package and your punchlets inside.
enrich.punchare sample punchlets. Check it out it illustrates the basics. This is where you write the actual logic of your log parsing or more generally data transformation.
groks/pattern.grokis a sample grok pattern. The punch comes with many patterns directly loaded, but here is how you can add your own.
resources/color_code.jsonis a sample resource files. In this sample it is used to add a numerical color code from a color string value ('red' or 'green').
test/unit_punchlet.jsonare punch unit test files. That lets you define unit tests to ensure each punchlet or a sequence of punchlets behave exactly as you expect.
test/sample.txtis an example a sample log file. These can be used to test a large number of logs.
It provides a complete yet simple example of what you can do with punchlets.
Test and Package¶
From there simply execute :
mvn clean test
mvn clean install
Al these are standard maven concepts and goodies. The punch simply leverage maven to produce velan and versionned packages. What is explained here for parser is similar for other punch (java or python) development.
As part of punch parser projects, you can write two types of tests that will automatically be executed.
Unit tests are designed to test precisely one or a chain of punchlet(s). A unit test is designed using a simple json file expressing the input log and the expectated parsed fields after that log has been processed by the punchlet.
The punchplatform-puncher.sh tools will automatically play all unit test files you include in your test project folder. For details refer to the punchplatform-puncher.sh manual page and its online help.
Sample tests use sample log files to simply check that all of them are parsed without errors. Sample tests are useful when working with customer log extracts. The punchplatform-puncher.sh tools will automatically play all sample files you include in your test project folder.
For details refer to the punchplatform-puncher.sh manual page and its online help.
The punchplatform-puncher.sh also provides options to evaluate your punchlet performance. Refer to how to write a robust parser. You will go through a tutorial to test the performance of our standard Apache parser.
You will then obtain a representative EPS number on a single thread (on some reference laptop architecture). That number is very useful to later on tune your punchlines to achieve your target throughput.
The first and simplest method is to transfer your punchlets, patterns and resource files to you local per tenant configuration tree. This topic is explained in detail in the next chapter.
As a preview what you must do is to copy your files to a special punchplatform configuration folder:
└── conf └── tenants └─-─ sampletenant └── channels │ └── samplechannel │ ├── channel_structure.yml │ └── samplepunchline.yml └── resources └── punch └── com └── mycompany └── sample ├── MANIFEST.yml ├── groks │ └── pattern.grok ├── enrich.punch ├── parser.punch └── resources └── color_codes.json
From there you can use your punchlets inside your channels.
Note that using this method, maven is not used anymore. You loose the version information. If you use that method ensure you use git to protect your configuration folder and keep track of your change history.
Always maintain the same tree structure (here 'com/mycompany/sample') in your layout. That tree structure is a universal and recommended pattern used in maven and java. If you conform to that, your punchlines will run fine on other deployed punch platforms.
Package Based Installation¶
This method is supported only from release 6.3.4 and higher.
You can alternatively ship the maven generated archive directly to the
folder located in your configuration folder. Here it is illustrated:
└── conf ├─── repository │ └─── firewall-parsers-1.0.0.zip └── tenants └─-─ sampletenant └─── channels └── samplechannel ├── channel_structure.yml └── samplepunchline.yml
From there you can simply refer to your parser various resources in punchlines as follows:
version: "6.3.4" name: dhcp-parser runtime: storm resources: - punch-parser:com.mycompany:firewall-parsers:1.0.0 dag: - type: syslog_input settings: listen: proto: tcp host: 0.0.0.0 port: 9902 publish: - stream: logs fields: - log - type: punchlet_node settings: json_resources: - com/mycompany/sample/resources/color_codes.json punchlets: - com/mycompany/sample/parser.punch subscribe: - component: syslog_input stream: logs
This method is robust and encouraged on production platforms.
The two methods have pros and cons. The package-based solution is best suited for production platforms with strong traceability and version control requirements. The filesystem solution is flexible but demand you protect your configuration folder using git on your own.