HOWTO test a parser before going to production
Why do that¶
Parsers are code. They must be tested, you must know their performance. You need a robust and simple methodology to put one or update one into your production platform.
Prerequisites¶
Make sure you have the latest standalone compatible with your target platform.
What to do¶
Identify the parser (punchlets)¶
For example, let's consider that your topology chains the following punchlets:
- standard/common/input.punch
- standard/common/parsing_syslog_header.punch
- standard/apache_httpd/parsing.punch
- standard/apache_httpd/enrichment.punch
- standard/apache_httpd/normalization.punch
You possibly need external resources for enrichment
- standard/apache_httpd/http_codes.json
- standard/apache_httpd/taxonomy.json
Identify your raw logs:¶
- Take it from production
- Use a default log-injector (for instance $PUNCHPLATFORM_CONF_DIR/resources/injectors/mytenant/apache_httpd_injector.json)
- Or construct your own complexe injector configuration file: punchplatform-log-injector.sh
Perform unit tests on your punchlet(s)¶
The following command let you check if a log has been correctly processed by your punchlets chain.
punchplatform-log-injector.sh -c apache_httpd_injector.json --punchlets standard/common/input.punch,standard/common/parsing_syslog_header.punch,standard/apache_httpd/parsing.punch,standard/apache_httpd/enrichment.punch,standard/apache_httpd/normalization.punch --resources standard/apache_httpd/http_codes.json,standard/apache_httpd/taxonomy.json --stream [logs][log] -n 1 -v
We get the following:
registering punchlet: standard/common/input.punch
...
19:52:07 c.t.s.c.p.p.resources [INFO] message="registered regular tuple" size=57 resource_name="http_codes"
...
punchlets compiled
[Wed Nov 22 19:52:11 CET 2017] client.apache_httpd_injector.json0 starts ....
input string ===========================
Nov 22 19:52:11 host0 128.216.77.224 - frank [31/Dec/2012:01:00:00 +0100] "GET /images/KSC-94EC-412-small.gif HTTP/1.0" 200 23279 "http://www.example.com/start.html" "Mozilla/5.0 (iPad; U; CPU OS 4_3_5 like Mac OS X; en-us) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8L1 Safari/6533.18.5"
input tuple ===========================
{
"logs": {
"raw_log": "Nov 22 19:52:11 host0 128.216.77.224 - frank [31/Dec/2012:01:00:00 +0100] "
}
}
19:52:11 c.t.s.c.p.u.PunchEnvironment [INFO] message="detected host ip" host_ip=127.0.0.1
19:52:11 c.t.s.c.p.u.PunchEnvironment [INFO] message="detected host name" host_name=MacBook-Pro-de-loic.local
19:52:11 c.t.s.c.p.p.r.o.Contains [INFO] built index for 189 entries for key set [code] in 8.033596ms
output tuple ===========================
{
"logs": {
"data": "128.216.77.224 - frank [31/Dec/2012:01:00:00 +0100] ",
"log": {
"app": {
"method": "GET",
"return": {
"code": "200"
}
},
"col": {
"host": {
"name": "MacBook-Pro-de-loic.local"
}
},
"obs": {
"host": {
"name": "host0"
},
"ts": "2012-12-31T01:00:00.000+01:00"
},
"init": {
"process": {
"name": "Mozilla/5.0 (iPad; U; CPU OS 4_3_5 like Mac OS X; en-us) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8L1 Safari/6533.18.5"
},
"host": {
"ip": "128.216.77.224"
}
},
"lmc": {
"parse": {
"host": {
"ip": "127.0.0.1",
"name": "MacBook-Pro-de-loic.local"
},
"ts": "2017-11-22T19:52:11.435+01:00"
}
},
"session": {
"out": {
"byte": 23279
}
},
"channel": "unknown",
"type": "web",
"target": {
"host": {
"name": "host0"
},
"uri": {
"urn": "/images/KSC-94EC-412-small.gif"
}
},
"taxo": {
"nf": {
"sev": "2",
"alarm": "160018"
}
},
"size": 307,
"web": {
"header": {
"referer": "http://www.example.com/start.html"
}
},
"vendor": "unknown",
"action": "OK",
"rep": {
"host": {
"name": "host0"
},
"ts": "2017-11-22T19:52:11.000+01:00"
},
"tenant": "unknown"
},
"raw_log": "Nov 22 19:52:11 host0 128.216.77.224 - frank [31/Dec/2012:01:00:00 +0100] ",
"es_index": "unknown-events-2017.11.22"
}
}
output string ===========================
{"logs":{"data":"128.216.77.224 - frank [31/Dec/2012:01:00:00 +0100] ","es_index":"unknown-events-2017.11.22"}}
[Wed Nov 22 19:52:11 CET 2017] client.apache_httpd_injector.json0 duration (s): 0 sent-msg : 1 rate (1/s): 17.2
[Wed Nov 22 19:52:11 CET 2017] client.apache_httpd_injector.json0 stopped.
Warning
Make sure to send data through the right stream thanks to the --stream option and that you get the expected output before proceeding further.
Run a performance test:¶
Finally, check the overall performance of your punchlet(s) by sending a huge load of data through your parsing.punch
or your complete chain of punchlets,
using the following command:
punchplatform-log-injector.sh -c apache_httpd_injector.json --punchlets standard/common/input.punch,standard/common/parsing_syslog_header.punch,standard/apache_httpd/parsing.punch,standard/apache_httpd/enrichment.punch,standard/apache_httpd/normalization.punch --resources standard/apache_httpd/http_codes.json,standard/apache_httpd/taxonomy.json --stream [logs][log] -t 50000 -n 1000000
Output example:
registering punchlet: standard/common/input.punch
registering punchlet: standard/common/parsing_syslog_header.punch
registering punchlet: standard/apache_httpd/parsing.punch
registering punchlet: standard/apache_httpd/enrichment.punch
registering punchlet: standard/apache_httpd/normalization.punch
registering resource: standard/apache_httpd/http_codes.json
registering resource: standard/apache_httpd/taxonomy.json
registering groks from /home/punch/Bureau/punch-standalone-6.0.0/conf/resources/punch/patterns
compiling ...
punchlets compiled
[Thu Dec 05 17:40:50 CET 2019] client.apache_httpd_injector.json0 starts ....
[Thu Dec 05 17:40:52 CET 2019] client.apache_httpd_injector.json0 duration (s): 2 sent-msg : 16074 rate (1/s): 8033.0
[Thu Dec 05 17:40:54 CET 2019] client.apache_httpd_injector.json0 duration (s): 4 sent-msg : 60674 rate (1/s): 22293.0
[Thu Dec 05 17:40:56 CET 2019] client.apache_httpd_injector.json0 duration (s): 6 sent-msg : 106849 rate (1/s): 23074.0
[Thu Dec 05 17:40:58 CET 2019] client.apache_httpd_injector.json0 duration (s): 8 sent-msg : 152797 rate (1/s): 22971.5
[Thu Dec 05 17:41:00 CET 2019] client.apache_httpd_injector.json0 duration (s): 10 sent-msg : 197579 rate (1/s): 22389.5
[Thu Dec 05 17:41:02 CET 2019] client.apache_httpd_injector.json0 duration (s): 12 sent-msg : 242471 rate (1/s): 22444.5
[Thu Dec 05 17:41:04 CET 2019] client.apache_httpd_injector.json0 duration (s): 14 sent-msg : 286735 rate (1/s): 22130.5
[Thu Dec 05 17:41:06 CET 2019] client.apache_httpd_injector.json0 duration (s): 16 sent-msg : 331430 rate (1/s): 22345.5
[Thu Dec 05 17:41:08 CET 2019] client.apache_httpd_injector.json0 duration (s): 18 sent-msg : 376152 rate (1/s): 22359.0
[Thu Dec 05 17:41:10 CET 2019] client.apache_httpd_injector.json0 duration (s): 20 sent-msg : 420825 rate (1/s): 22335.0
[Thu Dec 05 17:41:12 CET 2019] client.apache_httpd_injector.json0 duration (s): 22 sent-msg : 465371 rate (1/s): 22271.0
[Thu Dec 05 17:41:14 CET 2019] client.apache_httpd_injector.json0 duration (s): 24 sent-msg : 509629 rate (1/s): 22127.5
[Thu Dec 05 17:41:16 CET 2019] client.apache_httpd_injector.json0 duration (s): 26 sent-msg : 553725 rate (1/s): 22046.0
[Thu Dec 05 17:41:18 CET 2019] client.apache_httpd_injector.json0 duration (s): 28 sent-msg : 598827 rate (1/s): 22549.0
[Thu Dec 05 17:41:20 CET 2019] client.apache_httpd_injector.json0 duration (s): 30 sent-msg : 644384 rate (1/s): 22776.5
[Thu Dec 05 17:41:22 CET 2019] client.apache_httpd_injector.json0 duration (s): 32 sent-msg : 689784 rate (1/s): 22698.5
[Thu Dec 05 17:41:24 CET 2019] client.apache_httpd_injector.json0 duration (s): 34 sent-msg : 735972 rate (1/s): 23091.5
[Thu Dec 05 17:41:26 CET 2019] client.apache_httpd_injector.json0 duration (s): 36 sent-msg : 781986 rate (1/s): 22994.5
[Thu Dec 05 17:41:28 CET 2019] client.apache_httpd_injector.json0 duration (s): 38 sent-msg : 827973 rate (1/s): 22991.5
[Thu Dec 05 17:41:30 CET 2019] client.apache_httpd_injector.json0 duration (s): 40 sent-msg : 873869 rate (1/s): 22946.5
[Thu Dec 05 17:41:32 CET 2019] client.apache_httpd_injector.json0 duration (s): 42 sent-msg : 919993 rate (1/s): 23060.0
[Thu Dec 05 17:41:34 CET 2019] client.apache_httpd_injector.json0 duration (s): 44 sent-msg : 965770 rate (1/s): 22875.6
Note
If you pay attention to the rate indicator, you can see that our chain of punchlets is able to process between 22 and 23 kEPS (event per second) which means it is well optimised.
Interestingly enough, we highlighted a correlation between what you get theoretically (with this log injector locally) and what you could expect in practice, within your production environment. By running performance tests on all our standard parsers, we noticed a 52% ratio between the injector test and proper end-to-end processing of data onto the same machine. Moreover, assuming that topologies are properly configured, we can observe a proportional performance per worker and per executor on the Punch Bolt.