Skip to content

HOWTO test a parser before going to production

Why do that

Parsers are code. They must be tested, you must know their performance. You need a robust and simple methodology to put one or update one into your production platform.

Prerequisites

Make sure you have the latest standalone compatible with your target platform.

What to do

Identify the parser (punchlets)

For example, let's consider that your topology chains the following punchlets:

  • standard/common/input.punch
  • standard/common/parsing_syslog_header.punch
  • standard/apache_httpd/parsing.punch
  • standard/apache_httpd/enrichment.punch
  • standard/apache_httpd/normalization.punch

You possibly need external resources for enrichment

  • standard/apache_httpd/http_codes.json
  • standard/apache_httpd/taxonomy.json

Identify your raw logs:

  • Take it from production
  • Use a default log-injector (for instance $PUNCHPLATFORM_CONF_DIR/resources/injectors/mytenant/apache_httpd_injector.json)
  • Or construct your own complexe injector configuration file: punchplatform-log-injector.sh

Perform unit tests on your punchlet(s)

The following command let you check if a log has been correctly processed by your punchlets chain.

punchplatform-log-injector.sh -c apache_httpd_injector.json --punchlets standard/common/input.punch,standard/common/parsing_syslog_header.punch,standard/apache_httpd/parsing.punch,standard/apache_httpd/enrichment.punch,standard/apache_httpd/normalization.punch --resources standard/apache_httpd/http_codes.json,standard/apache_httpd/taxonomy.json --stream [logs][log] -n 1 -v

We get the following:

registering punchlet: standard/common/input.punch
...
19:52:07 c.t.s.c.p.p.resources [INFO] message="registered regular tuple" size=57 resource_name="http_codes"
...
punchlets compiled
[Wed Nov 22 19:52:11 CET 2017] client.apache_httpd_injector.json0 starts ....
input string ===========================
Nov 22 19:52:11 host0 128.216.77.224 - frank [31/Dec/2012:01:00:00 +0100] "GET /images/KSC-94EC-412-small.gif HTTP/1.0" 200 23279 "http://www.example.com/start.html" "Mozilla/5.0 (iPad; U; CPU OS 4_3_5 like Mac OS X; en-us) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8L1 Safari/6533.18.5"
input tuple ===========================
{
  "logs": {
    "raw_log": "Nov 22 19:52:11 host0 128.216.77.224 - frank [31/Dec/2012:01:00:00 +0100] "
  }
}
19:52:11 c.t.s.c.p.u.PunchEnvironment [INFO] message="detected host ip" host_ip=127.0.0.1
19:52:11 c.t.s.c.p.u.PunchEnvironment [INFO] message="detected host name" host_name=MacBook-Pro-de-loic.local
19:52:11 c.t.s.c.p.p.r.o.Contains [INFO] built index for 189 entries for key set [code] in 8.033596ms
output tuple ===========================
{
  "logs": {
    "data": "128.216.77.224 - frank [31/Dec/2012:01:00:00 +0100] ",
    "log": {
      "app": {
        "method": "GET",
        "return": {
          "code": "200"
        }
      },
      "col": {
        "host": {
          "name": "MacBook-Pro-de-loic.local"
        }
      },
      "obs": {
        "host": {
          "name": "host0"
        },
        "ts": "2012-12-31T01:00:00.000+01:00"
      },
      "init": {
        "process": {
          "name": "Mozilla/5.0 (iPad; U; CPU OS 4_3_5 like Mac OS X; en-us) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8L1 Safari/6533.18.5"
        },
        "host": {
          "ip": "128.216.77.224"
        }
      },
      "lmc": {
        "parse": {
          "host": {
            "ip": "127.0.0.1",
            "name": "MacBook-Pro-de-loic.local"
          },
          "ts": "2017-11-22T19:52:11.435+01:00"
        }
      },
      "session": {
        "out": {
          "byte": 23279
        }
      },
      "channel": "unknown",
      "type": "web",
      "target": {
        "host": {
          "name": "host0"
        },
        "uri": {
          "urn": "/images/KSC-94EC-412-small.gif"
        }
      },
      "taxo": {
        "nf": {
          "sev": "2",
          "alarm": "160018"
        }
      },
      "size": 307,
      "web": {
        "header": {
          "referer": "http://www.example.com/start.html"
        }
      },
      "vendor": "unknown",
      "action": "OK",
      "rep": {
        "host": {
          "name": "host0"
        },
        "ts": "2017-11-22T19:52:11.000+01:00"
      },
      "tenant": "unknown"
    },
    "raw_log": "Nov 22 19:52:11 host0 128.216.77.224 - frank [31/Dec/2012:01:00:00 +0100] ",
    "es_index": "unknown-events-2017.11.22"
  }
}
output string ===========================
{"logs":{"data":"128.216.77.224 - frank [31/Dec/2012:01:00:00 +0100] ","es_index":"unknown-events-2017.11.22"}}
[Wed Nov 22 19:52:11 CET 2017] client.apache_httpd_injector.json0 duration (s): 0     sent-msg : 1          rate (1/s): 17.2
[Wed Nov 22 19:52:11 CET 2017] client.apache_httpd_injector.json0 stopped.

Warning

Make sure to send data through the right stream thanks to the --stream option and that you get the expected output before proceeding further.

Run a performance test:

Finally, check the overall performance of your punchlet(s) by sending a huge load of data through your parsing.punch or your complete chain of punchlets, using the following command:

punchplatform-log-injector.sh -c apache_httpd_injector.json --punchlets standard/common/input.punch,standard/common/parsing_syslog_header.punch,standard/apache_httpd/parsing.punch,standard/apache_httpd/enrichment.punch,standard/apache_httpd/normalization.punch --resources standard/apache_httpd/http_codes.json,standard/apache_httpd/taxonomy.json --stream [logs][log] -t 50000 -n 1000000

Output example:

registering punchlet: standard/common/input.punch
registering punchlet: standard/common/parsing_syslog_header.punch
registering punchlet: standard/apache_httpd/parsing.punch
registering punchlet: standard/apache_httpd/enrichment.punch
registering punchlet: standard/apache_httpd/normalization.punch
registering resource: standard/apache_httpd/http_codes.json
registering resource: standard/apache_httpd/taxonomy.json
registering groks from /home/punch/Bureau/punch-standalone-6.0.0/conf/resources/punch/patterns
compiling ...
punchlets compiled
[Thu Dec 05 17:40:50 CET 2019] client.apache_httpd_injector.json0 starts ....
[Thu Dec 05 17:40:52 CET 2019] client.apache_httpd_injector.json0 duration (s): 2     sent-msg : 16074      rate (1/s): 8033.0 
[Thu Dec 05 17:40:54 CET 2019] client.apache_httpd_injector.json0 duration (s): 4     sent-msg : 60674      rate (1/s): 22293.0
[Thu Dec 05 17:40:56 CET 2019] client.apache_httpd_injector.json0 duration (s): 6     sent-msg : 106849     rate (1/s): 23074.0
[Thu Dec 05 17:40:58 CET 2019] client.apache_httpd_injector.json0 duration (s): 8     sent-msg : 152797     rate (1/s): 22971.5
[Thu Dec 05 17:41:00 CET 2019] client.apache_httpd_injector.json0 duration (s): 10    sent-msg : 197579     rate (1/s): 22389.5
[Thu Dec 05 17:41:02 CET 2019] client.apache_httpd_injector.json0 duration (s): 12    sent-msg : 242471     rate (1/s): 22444.5
[Thu Dec 05 17:41:04 CET 2019] client.apache_httpd_injector.json0 duration (s): 14    sent-msg : 286735     rate (1/s): 22130.5
[Thu Dec 05 17:41:06 CET 2019] client.apache_httpd_injector.json0 duration (s): 16    sent-msg : 331430     rate (1/s): 22345.5
[Thu Dec 05 17:41:08 CET 2019] client.apache_httpd_injector.json0 duration (s): 18    sent-msg : 376152     rate (1/s): 22359.0
[Thu Dec 05 17:41:10 CET 2019] client.apache_httpd_injector.json0 duration (s): 20    sent-msg : 420825     rate (1/s): 22335.0
[Thu Dec 05 17:41:12 CET 2019] client.apache_httpd_injector.json0 duration (s): 22    sent-msg : 465371     rate (1/s): 22271.0
[Thu Dec 05 17:41:14 CET 2019] client.apache_httpd_injector.json0 duration (s): 24    sent-msg : 509629     rate (1/s): 22127.5
[Thu Dec 05 17:41:16 CET 2019] client.apache_httpd_injector.json0 duration (s): 26    sent-msg : 553725     rate (1/s): 22046.0
[Thu Dec 05 17:41:18 CET 2019] client.apache_httpd_injector.json0 duration (s): 28    sent-msg : 598827     rate (1/s): 22549.0
[Thu Dec 05 17:41:20 CET 2019] client.apache_httpd_injector.json0 duration (s): 30    sent-msg : 644384     rate (1/s): 22776.5
[Thu Dec 05 17:41:22 CET 2019] client.apache_httpd_injector.json0 duration (s): 32    sent-msg : 689784     rate (1/s): 22698.5
[Thu Dec 05 17:41:24 CET 2019] client.apache_httpd_injector.json0 duration (s): 34    sent-msg : 735972     rate (1/s): 23091.5
[Thu Dec 05 17:41:26 CET 2019] client.apache_httpd_injector.json0 duration (s): 36    sent-msg : 781986     rate (1/s): 22994.5
[Thu Dec 05 17:41:28 CET 2019] client.apache_httpd_injector.json0 duration (s): 38    sent-msg : 827973     rate (1/s): 22991.5
[Thu Dec 05 17:41:30 CET 2019] client.apache_httpd_injector.json0 duration (s): 40    sent-msg : 873869     rate (1/s): 22946.5
[Thu Dec 05 17:41:32 CET 2019] client.apache_httpd_injector.json0 duration (s): 42    sent-msg : 919993     rate (1/s): 23060.0
[Thu Dec 05 17:41:34 CET 2019] client.apache_httpd_injector.json0 duration (s): 44    sent-msg : 965770     rate (1/s): 22875.6

Note

If you pay attention to the rate indicator, you can see that our chain of punchlets is able to process between 22 and 23 kEPS (event per second) which means it is well optimised.

Interestingly enough, we highlighted a correlation between what you get theoretically (with this log injector locally) and what you could expect in practice, within your production environment. By running performance tests on all our standard parsers, we noticed a 52% ratio between the injector test and proper end-to-end processing of data onto the same machine. Moreover, assuming that topologies are properly configured, we can observe a proportional performance per worker and per executor on the Punch Bolt.