Tutorial Write a Production Grade Parser¶
You succeeded in writing a first decent parser. The next step is to make sure you are inline with developers best practices, in order to maintain and publish your code to the community.
After reading this chapter, you will be able to submit to PunchPlatform
a standard_log_parsers
.
Make it modular¶
The punchlet we have so far is compact. But it does several different things you are likely to repeat for every parser:
- extract the timestamp from the header
- store the original log in a field for later archiving
- parse vendor specific fields
- normalize
- cleanup
- etc ...
You must know that copy and paste a piece of code to produce another one is NOT a good idea. The day you decide to change something in one parser you will have to repeat it in all parsers. It will never work.
Instead we will split our punchlet into three. The first one takes care of the [ [message ]] issue, plus a few more additional goodies: put an input timestamp to keep track of the arrival date of the log in the PunchPlatform, add a unique id, etc.. That punchlet in fact already exists, you simply need to reuse it. It is located in :
$PUNCHPLATFORM_CONF_DIR/resource/punch/standard/common/input.punch
Next we take care of the timestamp header. Again a punchlet already exists :
$PUNCHPLATFORM_CONF_DIR/resource/punch/standard/common/parsing_syslog_header.punch
Last we put the rest (the parsing part) in a parsing punchlet:
$PUNCHPLATFORM_CONF_DIR/resource/punch/standard/arkoon_ngfw/parsing.punch
Instead of one (punchlet) function, we now have three. We simply need to chain these in the log flow.
Conclusion: best is to separate the of your log from its headers, one per punchlet, so it 's reusable.
Manage Parsing error¶
Think about parsing error !
Each grok, csv, dissect or kv must be protected from parsing error with:
{
raise("ADAPT YOUR MESSAGE")
}
Benchmarking¶
Is your punchlet efficient ? The only way to get this answer is by test its efficiency with a benchmark. To do so, we will rely on the punchplatform-log-injector.sh
utility.
By convention, we write test performance in $PUNCHPLATFORM_CONF_DIR/resources/injector/<tenant>/perf
.
To have a reproducible test, we will create a short sh. To test
the standard apache_httpd
parser, we will setup a test using these 2 steps.
1) Write a local_parser that copy [logs][raw_log]
into [logs][log]
. To do so, copy and
paste this code into a new file called /resources/punch/standard/common/local_for_test_perf.punch
{
[logs][log] = [logs][raw_log];
}
2) Test the parser performance using this command:
punchplatform-log-injector.sh -c $PUNCHPLATFORM_CONF_DIR/resources/injector/mytenant/apache_httpd_injector.json --punchlets standard/common/local_for_test_perf.punch,standard/common/input.punch,standard/common/parsing_syslog_header.punch,standard/apache_httpd/parsing.punch,standard/apache_httpd/enrichment.punch,standard/apache_httpd/normalization.punch,standard/apache_httpd/taxonomy.json,standard/apache_httpd/http_codes.json
Let 's see how this script works:
- We call the
punchplatform-log-injector.sh
script, it will do the heavy lifting. - The option
-c
set the injector file to use for input log generation. --punchlets
is a comma-separated list of the punchlets used in our pipeline
Now, call this script and let see what happened
punchplatform-log-injector.sh -c $PUNCHPLATFORM_CONF_DIR/resources/injector/mytenant/apache_httpd_injector.json --punchlets standard/common/local_for_test_perf.punch,standard/common/input.punch,standard/common/parsing_syslog_header.punch,standard/apache_httpd/parsing.punch,standard/apache_httpd/enrichment.punch,standard/apache_httpd/normalization.punch,standard/apache_httpd/taxonomy.json,standard/apache_httpd/http_codes.json
registering punchlet: standard/common/local_for_test_perf.punch
registering punchlet: standard/common/input.punch
registering punchlet: standard/common/parsing_syslog_header.punch
registering punchlet: standard/apache_httpd/parsing.punch
registering punchlet: standard/apache_httpd/enrichment.punch
registering punchlet: standard/apache_httpd/normalization.punch
registering punchlet: standard/apache_httpd/taxonomy.json
registering punchlet: standard/apache_httpd/http_codes.json
registering groks from /home/gmfmi/punchplatform/standalone/punch-standalone-5.1.2/conf/resources/punch/patterns
compiling ...
punchlets compiled
running punchlets using infinite loop
running punchlet at maximum throughput
[Tue Feb 19 10:28:50 CET 2019] client.apache_httpd_injector.json0 starts ....
[Tue Feb 19 10:28:52 CET 2019] client.apache_httpd_injector.json0 duration (s): 2 sent-msg : 1065 rate (1/s): 532.5
[Tue Feb 19 10:28:54 CET 2019] client.apache_httpd_injector.json0 duration (s): 4 sent-msg : 24068 rate (1/s): 11497.0
[Tue Feb 19 10:28:56 CET 2019] client.apache_httpd_injector.json0 duration (s): 6 sent-msg : 55976 rate (1/s): 15944.5
[Tue Feb 19 10:28:58 CET 2019] client.apache_httpd_injector.json0 duration (s): 8 sent-msg : 89516 rate (1/s): 16768.0
[Tue Feb 19 10:29:00 CET 2019] client.apache_httpd_injector.json0 duration (s): 10 sent-msg : 123541 rate (1/s): 17011.0
[Tue Feb 19 10:29:02 CET 2019] client.apache_httpd_injector.json0 duration (s): 12 sent-msg : 158056 rate (1/s): 17247.4
[Tue Feb 19 10:29:04 CET 2019] client.apache_httpd_injector.json0 duration (s): 14 sent-msg : 192624 rate (1/s): 17278.5
[Tue Feb 19 10:29:06 CET 2019] client.apache_httpd_injector.json0 duration (s): 16 sent-msg : 227265 rate (1/s): 17319.5
We are now in an infinite loop which try to process as many log as possible. This way, we can test an end-to-end punchlets pipeline performance. To get the best result, a good practice is to let the script run at least for 5 minutes (as a warm-up) and then keep the rate as your real result.
Important
This benchmark heavily depends on your testing platform setup. This benchmark tool is not made to give absolute/reference rate but to compare new parser behaviour with others. It can be helpful to measure a parser performance after a refactoring or an improvement.
Going to production¶
Before going to production, you must write unit tests. Several alternatives are available:
- punchplatform-log-injector.sh
- punchlinectl
Test with a punchplatform-log-injector¶
1) Write a log injector configuration file
{
"destination" : { "proto" : "tcp", "host" : "127.0.0.1", "port" : 9901 },
"load" :{
"total_messages" : 1,
"stats_publish_interval" : "1s",
"message_throughput" : 1
},
"message" : {
"payloads" : [
"foo bar"
],
"fields" : {
}
}
}
2) Write your parser
{
[logs][log] = [logs][raw_log];
dissect("%{?a} %{&a}").on([logs][log]).into([dissect]);
}
3) Tests
punchplatform-log-injector.sh -c injector_conf.json --punchlets parser_to_test.punch
The Punchlets must be placed under resource/punch
Test with punchplatform-topology¶
1) Write a test topology
{
"dag":[
{
"type": "generator_input",
"settings": {
"messages": [
"{\"type\":\"fw\",\"obs\":{\"ts\":\"2018-11-12T23:56:03.000Z\"},\"init\":{\"host\":{\"ip\":\"127.0.0.1\"}},\"target\":{\"host\":{\"port\":1}}}"
]
},
"storm_settings": {
"component": "generator"
}
},
{
"type": "punchlet_node",
"settings": {
"punchlet": "./your_punchlet.punch"
},
"storm_settings": {
"component": "punchlet_node",
"subscribe": [
{
"component": "generator"
}
]
}
},
{
"type": "punchlet_node",
"bolts_settings": {
"punchlet_code": "{print(root);}"
},
"storm_settings": {
"component": "punchlet_node",
"subscribe": [
{
"component": "generator"
}
]
}
}
]
}
2) Run the topology
punchlinectl parsing_test_topology.json
After checking that every unit test is okay, setup a channel in your standalone that can handle your real logs (a scheme should do the job, see the examples in your Standalone) and try to send some of them into it. Check in Kibana if all you logs are as you thought, and then you can put your punchlets in the real flow.
Contributing¶
The PunchPlatform needs you just as you need the PunchPlatform. Your work is extremely valuable, having a standard base of punchlet parsers is our best asset.
- If you use a standard log parser, you have the insurance of a support in case of problem. So if you manage to make your parser standard (or update), you are helped by experts;
- If you are in need of a new parser, check first the
standard_log_parsers
chapter. Who knows if a teammate has already done it? By contributing, you participate to this virtuous circle.
Make development unit test¶
If you package your punchlet using punch compatible archives, the puncher lets you automatically execute all or some of your unit or sample tests. Simply execute it with the '-T' option as follows
punchplatform-puncher.sh -T path to your repository
Using this mode the puncher will scan your folders and look for punchlet group manifest files. Here is an example:
---
spec:
punchlets:
- punchlet: parser.punch
resources:
- resources/color_codes.json
groks:
- groks/pattern.grok
inputStream:
- name: logs
fields:
- name: data
type: string
- punchlet: enrich.punch
inputStream:
- name: logs
fields:
- name: data
type: string
You can define two types of tests: unit tests to test precisely a sinle or a chain of punchlets, or (so-called) sample test where you only ingest a file of sample logs and checks the chain of punchlets defined in your manifest file report no error when processing your samples.
Sample test(s) will automatically use the 'inputStream' specification of your first punchlet to determine what is the expected input single stream and field format expected by that first punchlet.
To play a single test file, simply type in the path to that test file. This works for both. unit and sample test files.
punchplatform-puncher.sh -T ./src/com/mycompany/sample/test/unit_chain.json
Here is an example unit test
{
"input": {
"message": "168.168.168.168 - - [18/Aug/2011:06:00:14 -0700] \"GET /style2.css HTTP/1.1\" 200 659433 \"http://www.semicomplete.com/blog/geekery/bypassing-captive-portals.html\" \"Mozilla/5.0 (iPad; U; CPU OS 4_3_5 like Mac OS X; en-us) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8L1 Safari/6533.18.5\""
},
"output": {
"includes": {
"app": {
"method": "GET",
"return": {
"code": "200"
}
},
"init": {
"host": {
"ip": "168.168.168.168"
},
"process": {
"name": "Mozilla/5.0 (iPad; U; CPU OS 4_3_5 like Mac OS X; en-us) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8L1 Safari/6533.18.5"
}
},
"obs": {
"ts": "2011-08-18T15:00:14.000+02:00"
},
"session": {
"out": {
"byte": 659433
}
},
"target": {
"uri": {
"urn": "/style2.css"
}
},
"type": "web"
}
}
}