Write industrial parsers

You succeeded in writing a first decent parser. The next step is to make sure you are inline with developers best practices, in order to maintain and publish your code to the community.

After reading this chapter, you will be able to submit to PunchPlatform a Standard Log Parsers!

Make it modular

The punchlet we have so far is compact. But it does several different things you are likely to repeat for every parser:

  • extract the timestamp from the header
  • store the original log in a field for later archiving
  • parse vendor specific fields
  • normalize
  • cleanup
  • etc …

You must know that copy and paste a piece of code to produce another one is NOT a good idea. The day you decide to change something in one parser you will have to repeat it in all parsers. It will never work.

Instead we will split our punchlet into three. The first one takes care of the [message] issue, plus a few more additional goodies: put an input timestamp to keep track of the arrival date of the log in the PunchPlatform, add a unique id, etc.. That punchlet in fact already exists, you simply need to reuse it. It is located in :

$PUNCHPLATFORM_CONF_DIR/resource/punch/standard/common/input.punch

Next we take care of the timestamp header. Again a punchlet already exists :

$PUNCHPLATFORM_CONF_DIR/resource/punch/standard/common/parsing_syslog_header.punch

Last we put the rest (the parsing part) in a parsing punchlet:

$PUNCHPLATFORM_CONF_DIR/resource/punch/standard/arkoon_ngfw/parsing.punch

Instead of one (punchlet) function, we now have three. We simply need to chain these in the log flow.

Conclusion: best is to separate the “payload” of your log from its headers, one per punchlet, so it’s reuseable.

Test-driven development

You have a parser, which is great to treat one log. If you want to enrich it, you must ensure that you don’t make any regression. To do so, you use unit tests, which is JSON and looks, in our case, basically like this:

{
  "input": {
    "punchlets": [
      "standard/arkoon_ngfw/parsing.punch"
    ]
    "message": "Alerts: AKLOG - id=firewall time=\"2015-11-17 10:46:02\" gmtime=1447753562 pri=2 fw=firewall02.group aktype=ALERT alert_type=\"Blocked by filter\" user=\"\" alert_level=\"Medium\" alert_desc=\"PROTO:112 from 10.0.0.1 to 100.0.0.1 [Default deny on input]\""
  },
  "output": {
    "includes": {
      "alarm": {
        "name": "ALERTBLOCKEDBYFILTER",
        "sev": "Medium"
      },
      "app": {
        "proto": {
          "name": "PROTO:112"
        }
      },
      "init": {
        "host": {
          "ip": "10.0.0.1"
        }
      },
      "obs": {
        "host": {
          "name": "firewall02.group"
        },
        "ts": "2015-11-17T10:46:02.000+01:00"
      },
      "target": {
        "host": {
          "ip": "100.0.0.1"
        }
      },
      "type": "ids"
    }
  }
}

You can see the “input” part, your log, and the expected output. When you receive a new log format, before tweaking your punchlet, build this JSON file by hand to think where you want to place each field and how. Then you can try to comply to it via punch code.

Place it in the resources/punch/test/arkoon_ngfw/ directory. Then, you can try your unit tests very simply:

$ punchplatform-puncher.sh -t resources/punch/test/arkoon_ngfw/

If you have a custorm path or configuration:

$ punchplatform-puncher.sh -t resources/punch/test/arkoon_ngfw/ -p <custorm_parser_path>

For each new logtype tested, every unit test should be OK.

Benchmarking

Is your punchlet efficient ? The only way to get this answer is by test its efficiency with a benchmark. To do so, we will rely on the punchplatform-log-injector.sh utility.

By convention, we write test performance in $PUNCHPLATFORM_CONF_DIR/resources/injector/<tenant>/perf.

To have a reproducible test, we will create a short shell script. For the standard Apache_http parser, a benchmark could look like:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
#!/bin/bash -u

punchplatform-log-injector.sh \
-c $PUNCHPLATFORM_CONF_DIR/resources/injector/mytenant/apache_httpd_injector.json \
--punchlets \
standard/common/input.punch,\
standard/common/parsing_syslog_header.punch,\
standard/apache_httpd/parsing.punch,\
standard/apache_httpd/enrichment.punch,\
standard/apache_httpd/normalization.punch,\
standard/apache_httpd/taxonomy.json,\
standard/apache_httpd/http_codes.json

Let’s see how this script works:

  1. We call the punchplatform-log-injector.sh script, it will do the heavy lifting.
  2. The option -c set the injector file to use for input log generation.
  3. --punchlets is a comma-separated list of the punchlets used in our pipeline

Now, call this script and let see what happend

$ ./apache_httpd_perf.sh
registering punchlet: standard/common/input.punch
registering punchlet: standard/common/parsing_syslog_header.punch
registering punchlet: standard/apache_httpd/parsing.punch
registering punchlet: standard/apache_httpd/enrichment.punch
registering punchlet: standard/apache_httpd/normalization.punch
registering punchlet: standard/apache_httpd/taxonomy.json
registering punchlet: standard/apache_httpd/http_codes.json
registering groks from [...]/punchplatform-standalone-4.0.1-SNAPSHOT/conf/resources/punch/patterns
compiling ...
15:53:54 c.t.s.c.p.p.resources [INFO] message="registered regular tuple" size=57 resource_name="http_codes"
15:53:54 c.t.s.c.p.p.resources [INFO] message="registered regular tuple" size=189 resource_name="taxonomy"
15:53:55 c.t.s.c.p.p.resources [INFO] message="registered regular tuple" size=57 resource_name="http_codes"
15:53:55 c.t.s.c.p.p.resources [INFO] message="registered regular tuple" size=189 resource_name="taxonomy"
15:53:56 c.t.s.c.p.p.resources [INFO] message="registered regular tuple" size=57 resource_name="http_codes"
15:53:56 c.t.s.c.p.p.resources [INFO] message="registered regular tuple" size=189 resource_name="taxonomy"
15:53:56 c.t.s.c.p.p.resources [INFO] message="registered regular tuple" size=57 resource_name="http_codes"
15:53:56 c.t.s.c.p.p.resources [INFO] message="registered regular tuple" size=189 resource_name="taxonomy"
15:53:56 c.t.s.c.p.p.resources [INFO] message="registered regular tuple" size=57 resource_name="http_codes"
15:53:56 c.t.s.c.p.p.resources [INFO] message="registered regular tuple" size=189 resource_name="taxonomy"
punchlets compiled
running punchlets using infinite loop
running punchlet at maximum throughput
[Mon Dec 11 15:53:57 CET 2017] client.apache_httpd_injector.json0 starts ....
15:53:57 c.t.s.c.p.u.PunchEnvironment [INFO] message="detected host ip" host_ip=127.0.0.1
15:53:57 c.t.s.c.p.u.PunchEnvironment [INFO] message="detected host name" host_name=PunchPlatform.local
15:53:57 c.t.s.c.p.p.r.o.Contains [INFO] built index for 189 entries for key set [code] in 2.824894ms
[Mon Dec 11 15:53:59 CET 2017] client.apache_httpd_injector.json0 duration (s): 2     sent-msg : 26785      rate (1/s): 13385.3
[Mon Dec 11 15:54:01 CET 2017] client.apache_httpd_injector.json0 duration (s): 4     sent-msg : 58769      rate (1/s): 15954.6
[Mon Dec 11 15:54:03 CET 2017] client.apache_httpd_injector.json0 duration (s): 6     sent-msg : 91294      rate (1/s): 16261.0
[Mon Dec 11 15:54:05 CET 2017] client.apache_httpd_injector.json0 duration (s): 8     sent-msg : 123908     rate (1/s): 16305.0
[Mon Dec 11 15:54:07 CET 2017] client.apache_httpd_injector.json0 duration (s): 10    sent-msg : 159612     rate (1/s): 17814.9
[Mon Dec 11 15:54:09 CET 2017] client.apache_httpd_injector.json0 duration (s): 12    sent-msg : 195114     rate (1/s): 17740.6
[Mon Dec 11 15:54:11 CET 2017] client.apache_httpd_injector.json0 duration (s): 14    sent-msg : 230990     rate (1/s): 17937.0
[Mon Dec 11 15:54:13 CET 2017] client.apache_httpd_injector.json0 duration (s): 16    sent-msg : 267378     rate (1/s): 18147.6
[Mon Dec 11 15:54:15 CET 2017] client.apache_httpd_injector.json0 duration (s): 18    sent-msg : 303324     rate (1/s): 17927.2
[Mon Dec 11 15:54:17 CET 2017] client.apache_httpd_injector.json0 duration (s): 20    sent-msg : 339166     rate (1/s): 17919.5
[Mon Dec 11 15:54:19 CET 2017] client.apache_httpd_injector.json0 duration (s): 22    sent-msg : 375251     rate (1/s): 18032.0
[Mon Dec 11 15:54:21 CET 2017] client.apache_httpd_injector.json0 duration (s): 24    sent-msg : 411340     rate (1/s): 17998.0
[Mon Dec 11 15:54:23 CET 2017] client.apache_httpd_injector.json0 duration (s): 26    sent-msg : 447581     rate (1/s): 18074.3
[Mon Dec 11 15:54:25 CET 2017] client.apache_httpd_injector.json0 duration (s): 28    sent-msg : 483529     rate (1/s): 17972.5
[Mon Dec 11 15:54:27 CET 2017] client.apache_httpd_injector.json0 duration (s): 30    sent-msg : 519685     rate (1/s): 18076.5

We are now in an infinite loop which try to process as many log as possible. This way, we can test an end-to-end punchlets pipeline performance. To get the best result, a good practice is to let the script run at least for 5 minutes (as a warm-up) and then keep the rate as your real result.

Important

This benchmark heavily depends on your testing platform setup. This benchmark tool is not made to give absolute/reference rate but to compare new parser behaviour with others. It can be helpful to measure a parser performance after a refactoring or an improvement.

Going to production

After checking that every unit test is okay, setup a channel in your standalone that can handle your real logs (a “Single” scheme should do the job, see the examples in your Standalone) and try to send some of them into it. Check in Kibana if all you logs are as you thought, and then you can put your punchlets in the real flow.

Contributing

PunchPlatform needs you, you need PunchPlatform. As your work is extremely valuable, having a standard base of punchlet parsers is our best asset. And yours:

  • If you use a standard log parser, you have the insurance of a support in case of problem. So if you manage to make your parser standard (or update), you are helped by experts;
  • If you are in need of a new parser, check first the <standardLogParsers> section. Who knows if a teammate has already done it? By contributing, you participate to this virtuous circle.

Send us to our mailing list (cf. the <contacts> section) your parsing punchlets, your header punchlets, your GROK patterns and your unit tests; if you properly followed this section, it might go standard after internal testing.