Tutorial Write a Production Grade Parser¶
You succeeded in writing a first decent parser. The next step is to make sure you are inline with developers best practices, in order to maintain and publish your code to the community.
After reading this chapter, you will be able to submit to PunchPlatform
a standard_log_parsers
.
Make it modular¶
The punchlet we have so far is compact. But it does several different things you are likely to repeat for every parser:
- extract the timestamp from the header
- store the original log in a field for later archiving
- parse vendor specific fields
- normalize
- cleanup
- etc ...
You must know that copy and paste a piece of code to produce another one is NOT a good idea. The day you decide to change something in one parser you will have to repeat it in all parsers. It will never work.
Instead we will split our punchlet into three. The first one takes care of the [ [message ]] issue, plus a few more additional goodies: put an input timestamp to keep track of the arrival date of the log in the PunchPlatform, add a unique id, etc.. That punchlet in fact already exists, you simply need to reuse it. It is located in :
$PUNCHPLATFORM_CONF_DIR/resource/punch/standard/common/input.punch
Next we take care of the timestamp header. Again a punchlet already exists :
$PUNCHPLATFORM_CONF_DIR/resource/punch/standard/common/parsing_syslog_header.punch
Last we put the rest (the parsing part) in a parsing punchlet:
$PUNCHPLATFORM_CONF_DIR/resource/punch/standard/arkoon_ngfw/parsing.punch
Instead of one (punchlet) function, we now have three. We simply need to chain these in the log flow.
Conclusion: best is to separate the of your log from its headers, one per punchlet, so it 's reusable.
Manage Parsing error¶
Think about parsing error !
Each grok, csv, dissect or kv must be protected from parsing error with:
{
throw new PunchRuntimeException("ADAPT YOUR MESSAGE")
}
Benchmarking¶
Is your punchlet efficient ? The only way to get this answer is by test its efficiency with a benchmark. To do so, we will rely on the punchplatform-log-injector.sh
utility.
By convention, we write test performance in $PUNCHPLATFORM_CONF_DIR/resources/injector/<tenant>/perf
.
To have a reproducible test, we will create a short sh. To test
the standard apache_httpd
parser, we will setup a test using these 2 steps.
1) Write a local_parser that copy [logs][raw_log]
into [logs][log]
. To do so, copy and
paste this code into a new file called /resources/punch/standard/common/local_for_test_perf.punch
{
[logs][log] = [logs][raw_log];
}
2) Test the parser performance using this command:
punchplatform-log-injector.sh -c $PUNCHPLATFORM_CONF_DIR/resources/injector/mytenant/apache_httpd_injector.json --punchlets standard/common/local_for_test_perf.punch,standard/common/input.punch,standard/common/parsing_syslog_header.punch,standard/apache_httpd/parsing.punch,standard/apache_httpd/enrichment.punch,standard/apache_httpd/normalization.punch,standard/apache_httpd/taxonomy.json,standard/apache_httpd/http_codes.json
Let 's see how this script works:
- We call the
punchplatform-log-injector.sh
script, it will do the heavy lifting. - The option
-c
set the injector file to use for input log generation. --punchlets
is a comma-separated list of the punchlets used in our pipeline
Now, call this script and let see what happened
punchplatform-log-injector.sh -c $PUNCHPLATFORM_CONF_DIR/resources/injector/mytenant/apache_httpd_injector.json --punchlets standard/common/local_for_test_perf.punch,standard/common/input.punch,standard/common/parsing_syslog_header.punch,standard/apache_httpd/parsing.punch,standard/apache_httpd/enrichment.punch,standard/apache_httpd/normalization.punch,standard/apache_httpd/taxonomy.json,standard/apache_httpd/http_codes.json
registering punchlet: standard/common/local_for_test_perf.punch
registering punchlet: standard/common/input.punch
registering punchlet: standard/common/parsing_syslog_header.punch
registering punchlet: standard/apache_httpd/parsing.punch
registering punchlet: standard/apache_httpd/enrichment.punch
registering punchlet: standard/apache_httpd/normalization.punch
registering punchlet: standard/apache_httpd/taxonomy.json
registering punchlet: standard/apache_httpd/http_codes.json
registering groks from /home/gmfmi/punchplatform/standalone/punch-standalone-5.1.2/conf/resources/punch/patterns
compiling ...
punchlets compiled
running punchlets using infinite loop
running punchlet at maximum throughput
[Tue Feb 19 10:28:50 CET 2019] client.apache_httpd_injector.json0 starts ....
[Tue Feb 19 10:28:52 CET 2019] client.apache_httpd_injector.json0 duration (s): 2 sent-msg : 1065 rate (1/s): 532.5
[Tue Feb 19 10:28:54 CET 2019] client.apache_httpd_injector.json0 duration (s): 4 sent-msg : 24068 rate (1/s): 11497.0
[Tue Feb 19 10:28:56 CET 2019] client.apache_httpd_injector.json0 duration (s): 6 sent-msg : 55976 rate (1/s): 15944.5
[Tue Feb 19 10:28:58 CET 2019] client.apache_httpd_injector.json0 duration (s): 8 sent-msg : 89516 rate (1/s): 16768.0
[Tue Feb 19 10:29:00 CET 2019] client.apache_httpd_injector.json0 duration (s): 10 sent-msg : 123541 rate (1/s): 17011.0
[Tue Feb 19 10:29:02 CET 2019] client.apache_httpd_injector.json0 duration (s): 12 sent-msg : 158056 rate (1/s): 17247.4
[Tue Feb 19 10:29:04 CET 2019] client.apache_httpd_injector.json0 duration (s): 14 sent-msg : 192624 rate (1/s): 17278.5
[Tue Feb 19 10:29:06 CET 2019] client.apache_httpd_injector.json0 duration (s): 16 sent-msg : 227265 rate (1/s): 17319.5
We are now in an infinite loop which try to process as many log as possible. This way, we can test an end-to-end punchlets pipeline performance. To get the best result, a good practice is to let the script run at least for 5 minutes (as a warm-up) and then keep the rate as your real result.
Important
This benchmark heavily depends on your testing platform setup. This benchmark tool is not made to give absolute/reference rate but to compare new parser behaviour with others. It can be helpful to measure a parser performance after a refactoring or an improvement.
Going to production¶
Before going to production, you must write unit tests. Several alternatives are available:
- punchplatform-log-injector.sh
- punchlinectl
Test with a punchplatform-log-injector¶
1) Write a log injector configuration file
{
"destination" : { "proto" : "tcp", "host" : "127.0.0.1", "port" : 9901 },
"load" :{
"total_messages" : 1,
"stats_publish_interval" : "1s",
"message_throughput" : 1
},
"message" : {
"payloads" : [
"foo bar"
],
"fields" : {
}
}
}
2) Write your parser
{
[logs][log] = [logs][raw_log];
dissect("%{?a} %{&a}").on([logs][log]).into([dissect]);
}
3) Tests
punchplatform-log-injector.sh -c injector_conf.json --punchlets parser_to_test.punch
The Punchlets must be placed under resource/punch
Test with punchplatform-topology¶
1) Write a test topology
{
"dag":[
{
"type": "generator_input",
"settings": {
"messages": [
"{\"type\":\"fw\",\"obs\":{\"ts\":\"2018-11-12T23:56:03.000Z\"},\"init\":{\"host\":{\"ip\":\"127.0.0.1\"}},\"target\":{\"host\":{\"port\":1}}}"
]
},
"storm_settings": {
"component": "generator"
}
},
{
"type": "punchlet_node",
"settings": {
"punchlet": "./your_punchlet.punch"
},
"storm_settings": {
"component": punchlet_node",
"subscribe": [
{
"component": "generator"
}
]
}
},
{
"type": "punchlet_node",
"bolts_settings": {
"punchlet_code": "{print(root);}"
},
"storm_settings": {
"component": punchlet_node",
"subscribe": [
{
"component": "generator"
}
]
}
}
]
}
2) Run the topology
punchlinectl parsing_test_topology.json
After checking that every unit test is okay, setup a channel in your standalone that can handle your real logs (a scheme should do the job, see the examples in your Standalone) and try to send some of them into it. Check in Kibana if all you logs are as you thought, and then you can put your punchlets in the real flow.
Contributing¶
The PunchPlatform needs you just as you need the PunchPlatform. Your work is extremely valuable, having a standard base of punchlet parsers is our best asset.
- If you use a standard log parser, you have the insurance of a support in case of problem. So if you manage to make your parser standard (or update), you are helped by experts;
- If you are in need of a new parser, check first the
standard_log_parsers
chapter. Who knows if a teammate has already done it? By contributing, you participate to this virtuous circle.
Make development unit test¶
All standard parsers are located in the pp-resources repository
And then :
cd standard-resources
1) Create your parser
ls resources/punch/standard/apache_httpd/
-rw-r--r-- 1 loicjardin staff 216 Nov 22 08:46 enrichment.punch
-rw-r--r-- 1 loicjardin staff 306 Nov 22 08:46 enrichment_useragents.punch
-rw-r--r-- 1 loicjardin staff 1624 Nov 22 08:46 http_codes.json
-rw-r--r-- 1 loicjardin staff 509 Nov 22 08:46 normalization.punch
-rw-r--r-- 1 loicjardin staff 2796 Nov 22 08:46 parsing.punch
-rw-r--r-- 1 loicjardin staff 27396 Nov 22 08:46 taxonomy.json
2) Create dev unit resources
ls -l src/test/resources/standard/apache_httpd/
total 24
-rw-r--r-- 1 loicjardin staff 441 Nov 22 08:46 sample_1.txt
-rw-r--r-- 1 loicjardin staff 1027 Nov 22 08:46 unit_1.json
-rw-r--r-- 1 loicjardin staff 595 Nov 22 08:46 unit_2.json
For example:
cat src/test/resources/standard/apache_httpd/unit_1.json
{
"input": {
"message": "168.168.168.168 - - [18/Aug/2011:06:00:14 -0700] \"GET /style2.css HTTP/1.1\" 200 659433 \"http://www.semicomplete.com/blog/geekery/bypassing-captive-portals.html\" \"Mozilla/5.0 (iPad; U; CPU OS 4_3_5 like Mac OS X; en-us) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8L1 Safari/6533.18.5\""
},
"output": {
"includes": {
"app": {
"method": "GET",
"return": {
"code": "200"
}
},
"init": {
"host": {
"ip": "168.168.168.168"
},
"process": {
"name": "Mozilla/5.0 (iPad; U; CPU OS 4_3_5 like Mac OS X; en-us) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8L1 Safari/6533.18.5"
}
},
"obs": {
"ts": "2011-08-18T15:00:14.000+02:00"
},
"session": {
"out": {
"byte": 659433
}
},
"target": {
"uri": {
"urn": "/style2.css"
}
},
"type": "web"
}
}
}
3) Create the test
cat src/test/java/org/thales/punch/parsers/test/ApacheHttpdTest.java
package org.thales.punch.parsers.test;
import java.net.URISyntaxException;
import com.thales.services.cloudomc.punchplatform.punch.api.test.PlayBook;
import org.testng.Assert;
import org.testng.annotations.*;
@SuppressWarnings("javadoc")
@Test
public class ApacheHttpdTest
{
PlayBook playbook;
@BeforeClass(alwaysRun = true)
public void setup() throws URISyntaxException {
playbook = new PlayBook(ApacheHttpdTest.class)
.setBreakPoint(new PlayBook.BreakPoint() {
public void breakpoint() {
System.out.println("put a breakpoint here if you have errors");
}
});
playbook.addGrokPatterns("punch/patterns")
.setInputTupleField("[logs][data]")
.addResources(
"punch/standard/apache_httpd/parsing.punch"
);
}
@AfterClass(alwaysRun = true)
public void close() {
playbook.close();
}
public void sample_1() throws Exception
{
Assert.assertTrue(
playbook.playSampleLogs("standard/apache_httpd/sample_1.txt")
);
}
public void unit_1() throws Exception
{
Assert.assertTrue(
playbook.playUnitTest("standard/apache_httpd/unit_1.json")
);
}
public void unit_2() throws Exception
{
Assert.assertTrue(
playbook.playUnitTest("standard/apache_httpd/unit_2.json")
);
}
}
4) Run the test
mvn test -Dtest=ApacheHttpdTest