Skip to content

HOWTO test a parser before going to production

Why do that

In production context, the most value is in the running code (parser, aggregation, anomaly job). To improve the quality of the service, the platform owner has to frequently update this code. This HOWTO procedure is only a method example to go to production.

The PunchPlatform Professional Services provides a list of standard parser. They are best practice examples!

From the Professional Services experiences:

1
2
3
4
5
6
7
-   Most of the parser are not standard
-   Before each update, the platform owner has to test the update to
    check the performance impact and the result of the change. He
    does not need a big device. It 's better if he can test
    locally. Benefits are: work everywhere, use the PunchPlatform
    Sublime Text plugin, just custom his environment.
-   Do not update code on Fridays ...

Prerequisites

  • A recent PunchPlatform standalone

What to do

Install the Standalone

$ ./install.sh -s
$ source ~/.bashrc

No need to start anything, you will be able to run your tests quickly.

Identify the parser (punchlets)

For example, let's consider that your topology chains the following punchlets:

  • standard/common/input.punch
  • standard/common/parsing_syslog_header.punch
  • standard/apache_httpd/parsing.punch
  • standard/apache_httpd/enrichment.punch
  • standard/apache_httpd/normalization.punch

You possibly need external resources for enrichment

  • standard/apache_httpd/http_codes.json
  • standard/apache_httpd/taxonomy.json

Identify your raw logs:

  • Take it from production
  • Use a default log-injector (for instance $PUNCHPLATFORM_CONF_DIR/resources/injector/mytenant/apache_httpd_injector.json)
  • Or construct your own complex injector configuration file: punchplatform-log-injector.sh

Perform unit tests on your punchlet(s)

The following command let you check if a log has been correctly processed by your punchlets chain.

punchplatform-log-injector.sh -c apache_httpd_injector.json --punchlets standard/common/input.punch,standard/common/parsing_syslog_header.punch,standard/apache_httpd/parsing.punch,standard/apache_httpd/enrichment.punch,standard/apache_httpd/normalization.punch --resources standard/apache_httpd/http_codes.json,standard/apache_httpd/taxonomy.json --stream [logs][log] -n 1 -v

We get the following:

registering punchlet: standard/common/input.punch
...
19:52:07 c.t.s.c.p.p.resources [INFO] message="registered regular tuple" size=57 resource_name="http_codes"
...
punchlets compiled
[Wed Nov 22 19:52:11 CET 2017] client.apache_httpd_injector.json0 starts ....
input string ===========================
Nov 22 19:52:11 host0 128.216.77.224 - frank [31/Dec/2012:01:00:00 +0100] "GET /images/KSC-94EC-412-small.gif HTTP/1.0" 200 23279 "http://www.example.com/start.html" "Mozilla/5.0 (iPad; U; CPU OS 4_3_5 like Mac OS X; en-us) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8L1 Safari/6533.18.5"
input tuple ===========================
{
  "logs": {
    "raw_log": "Nov 22 19:52:11 host0 128.216.77.224 - frank [31/Dec/2012:01:00:00 +0100] "
  }
}
19:52:11 c.t.s.c.p.u.PunchEnvironment [INFO] message="detected host ip" host_ip=127.0.0.1
19:52:11 c.t.s.c.p.u.PunchEnvironment [INFO] message="detected host name" host_name=MacBook-Pro-de-loic.local
19:52:11 c.t.s.c.p.p.r.o.Contains [INFO] built index for 189 entries for key set [code] in 8.033596ms
output tuple ===========================
{
  "logs": {
    "data": "128.216.77.224 - frank [31/Dec/2012:01:00:00 +0100] ",
    "log": {
      "app": {
        "method": "GET",
        "return": {
          "code": "200"
        }
      },
      "col": {
        "host": {
          "name": "MacBook-Pro-de-loic.local"
        }
      },
      "obs": {
        "host": {
          "name": "host0"
        },
        "ts": "2012-12-31T01:00:00.000+01:00"
      },
      "init": {
        "process": {
          "name": "Mozilla/5.0 (iPad; U; CPU OS 4_3_5 like Mac OS X; en-us) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8L1 Safari/6533.18.5"
        },
        "host": {
          "ip": "128.216.77.224"
        }
      },
      "lmc": {
        "parse": {
          "host": {
            "ip": "127.0.0.1",
            "name": "MacBook-Pro-de-loic.local"
          },
          "ts": "2017-11-22T19:52:11.435+01:00"
        }
      },
      "session": {
        "out": {
          "byte": 23279
        }
      },
      "channel": "unknown",
      "type": "web",
      "target": {
        "host": {
          "name": "host0"
        },
        "uri": {
          "urn": "/images/KSC-94EC-412-small.gif"
        }
      },
      "taxo": {
        "nf": {
          "sev": "2",
          "alarm": "160018"
        }
      },
      "size": 307,
      "web": {
        "header": {
          "referer": "http://www.example.com/start.html"
        }
      },
      "vendor": "unknown",
      "action": "OK",
      "rep": {
        "host": {
          "name": "host0"
        },
        "ts": "2017-11-22T19:52:11.000+01:00"
      },
      "tenant": "unknown"
    },
    "raw_log": "Nov 22 19:52:11 host0 128.216.77.224 - frank [31/Dec/2012:01:00:00 +0100] ",
    "es_index": "unknown-events-2017.11.22"
  }
}
output string ===========================
{"logs":{"data":"128.216.77.224 - frank [31/Dec/2012:01:00:00 +0100] ","es_index":"unknown-events-2017.11.22"}}
[Wed Nov 22 19:52:11 CET 2017] client.apache_httpd_injector.json0 duration (s): 0     sent-msg : 1          rate (1/s): 17.2
[Wed Nov 22 19:52:11 CET 2017] client.apache_httpd_injector.json0 stopped.

Warning

Make sure to send data through the right stream thanks to the --stream option and that you get the expected output before proceeding further.

Run a performance test:

Finally, check the overall performance of your punchlet(s) by sending a huge load of data through your parsing.punch or your complete chain of punchlets, using the following command:

punchplatform-log-injector.sh -c apache_httpd_injector.json --punchlets standard/common/input.punch,standard/common/parsing_syslog_header.punch,standard/apache_httpd/parsing.punch,standard/apache_httpd/enrichment.punch,standard/apache_httpd/normalization.punch --resources standard/apache_httpd/http_codes.json,standard/apache_httpd/taxonomy.json --stream [logs][log] -t 50000 -n 1000000

Output example:

registering punchlet: standard/common/input.punch
registering punchlet: standard/common/parsing_syslog_header.punch
registering punchlet: standard/apache_httpd/parsing.punch
registering punchlet: standard/apache_httpd/enrichment.punch
registering punchlet: standard/apache_httpd/normalization.punch
registering resource: standard/apache_httpd/http_codes.json
registering resource: standard/apache_httpd/taxonomy.json
registering groks from /home/punch/Bureau/punchplatform-standalone-6.0.0/conf/resources/punch/patterns
compiling ...
punchlets compiled
[Thu Dec 05 17:40:50 CET 2019] client.apache_httpd_injector.json0 starts ....
[Thu Dec 05 17:40:52 CET 2019] client.apache_httpd_injector.json0 duration (s): 2     sent-msg : 16074      rate (1/s): 8033.0 
[Thu Dec 05 17:40:54 CET 2019] client.apache_httpd_injector.json0 duration (s): 4     sent-msg : 60674      rate (1/s): 22293.0
[Thu Dec 05 17:40:56 CET 2019] client.apache_httpd_injector.json0 duration (s): 6     sent-msg : 106849     rate (1/s): 23074.0
[Thu Dec 05 17:40:58 CET 2019] client.apache_httpd_injector.json0 duration (s): 8     sent-msg : 152797     rate (1/s): 22971.5
[Thu Dec 05 17:41:00 CET 2019] client.apache_httpd_injector.json0 duration (s): 10    sent-msg : 197579     rate (1/s): 22389.5
[Thu Dec 05 17:41:02 CET 2019] client.apache_httpd_injector.json0 duration (s): 12    sent-msg : 242471     rate (1/s): 22444.5
[Thu Dec 05 17:41:04 CET 2019] client.apache_httpd_injector.json0 duration (s): 14    sent-msg : 286735     rate (1/s): 22130.5
[Thu Dec 05 17:41:06 CET 2019] client.apache_httpd_injector.json0 duration (s): 16    sent-msg : 331430     rate (1/s): 22345.5
[Thu Dec 05 17:41:08 CET 2019] client.apache_httpd_injector.json0 duration (s): 18    sent-msg : 376152     rate (1/s): 22359.0
[Thu Dec 05 17:41:10 CET 2019] client.apache_httpd_injector.json0 duration (s): 20    sent-msg : 420825     rate (1/s): 22335.0
[Thu Dec 05 17:41:12 CET 2019] client.apache_httpd_injector.json0 duration (s): 22    sent-msg : 465371     rate (1/s): 22271.0
[Thu Dec 05 17:41:14 CET 2019] client.apache_httpd_injector.json0 duration (s): 24    sent-msg : 509629     rate (1/s): 22127.5
[Thu Dec 05 17:41:16 CET 2019] client.apache_httpd_injector.json0 duration (s): 26    sent-msg : 553725     rate (1/s): 22046.0
[Thu Dec 05 17:41:18 CET 2019] client.apache_httpd_injector.json0 duration (s): 28    sent-msg : 598827     rate (1/s): 22549.0
[Thu Dec 05 17:41:20 CET 2019] client.apache_httpd_injector.json0 duration (s): 30    sent-msg : 644384     rate (1/s): 22776.5
[Thu Dec 05 17:41:22 CET 2019] client.apache_httpd_injector.json0 duration (s): 32    sent-msg : 689784     rate (1/s): 22698.5
[Thu Dec 05 17:41:24 CET 2019] client.apache_httpd_injector.json0 duration (s): 34    sent-msg : 735972     rate (1/s): 23091.5
[Thu Dec 05 17:41:26 CET 2019] client.apache_httpd_injector.json0 duration (s): 36    sent-msg : 781986     rate (1/s): 22994.5
[Thu Dec 05 17:41:28 CET 2019] client.apache_httpd_injector.json0 duration (s): 38    sent-msg : 827973     rate (1/s): 22991.5
[Thu Dec 05 17:41:30 CET 2019] client.apache_httpd_injector.json0 duration (s): 40    sent-msg : 873869     rate (1/s): 22946.5
[Thu Dec 05 17:41:32 CET 2019] client.apache_httpd_injector.json0 duration (s): 42    sent-msg : 919993     rate (1/s): 23060.0
[Thu Dec 05 17:41:34 CET 2019] client.apache_httpd_injector.json0 duration (s): 44    sent-msg : 965770     rate (1/s): 22875.6

Note

If you pay attention to the rate indicator, you can see that our chain of punchlets is able to process between 22 and 23 kEPS (event per second) which means it is well optimised.

Interestingly enough, we highlighted a correlation between what you get theoretically (with this log injector locally) and what you could expect in practice, within your production environment. By running performance tests on all our standard parsers, we noticed a 52% ratio between the injector test and proper end-to-end processing of data onto the same machine. Moreover, assuming that topologies are properly configured, we can observe a proportional performance per worker and per executor on the Punch Bolt.