Tutorial Write a Production Grade Parser¶

You succeeded in writing a first decent parser. The next step is to make sure you are inline with developers best practices, in order to maintain and publish your code to the community.

After reading this chapter, you will be able to submit to PunchPlatform a standard_log_parsers.

Make it modular¶

The punchlet we have so far is compact. But it does several different things you are likely to repeat for every parser:

extract the timestamp from the header
store the original log in a field for later archiving
parse vendor specific fields
normalize
cleanup
etc ...

You must know that copy and paste a piece of code to produce another one is NOT a good idea. The day you decide to change something in one parser you will have to repeat it in all parsers. It will never work.

Instead we will split our punchlet into three. The first one takes care of the [ [message ]] issue, plus a few more additional goodies: put an input timestamp to keep track of the arrival date of the log in the PunchPlatform, add a unique id, etc.. That punchlet in fact already exists, you simply need to reuse it. It is located in :

$PUNCHPLATFORM_CONF_DIR/resource/punch/standard/common/input.punch

Next we take care of the timestamp header. Again a punchlet already exists :

$PUNCHPLATFORM_CONF_DIR/resource/punch/standard/common/parsing_syslog_header.punch

Last we put the rest (the parsing part) in a parsing punchlet:

$PUNCHPLATFORM_CONF_DIR/resource/punch/standard/arkoon_ngfw/parsing.punch

Instead of one (punchlet) function, we now have three. We simply need to chain these in the log flow.

Conclusion: best is to separate the of your log from its headers, one per punchlet, so it 's reusable.

Manage Parsing error¶

Think about parsing error !

Each grok, csv, dissect or kv must be protected from parsing error with:

{
    raise("ADAPT YOUR MESSAGE")
}

Benchmarking¶

Is your punchlet efficient ? The only way to get this answer is by test its efficiency with a benchmark. To do so, we will rely on the punchplatform-log-injector.sh utility.

By convention, we write test performance in $PUNCHPLATFORM_CONF_DIR/resources/injector/<tenant>/perf.

To have a reproducible test, we will create a short sh. To test the standard apache_httpd parser, we will setup a test using these 2 steps.

1) Write a local_parser that copy [logs][raw_log] into [logs][log]. To do so, copy and paste this code into a new file called /resources/punch/standard/common/local_for_test_perf.punch

{
    [logs][log] = [logs][raw_log];
}

2) Test the parser performance using this command:

punchplatform-log-injector.sh -c $PUNCHPLATFORM_CONF_DIR/resources/injector/mytenant/apache_httpd_injector.json --punchlets standard/common/local_for_test_perf.punch,standard/common/input.punch,standard/common/parsing_syslog_header.punch,standard/apache_httpd/parsing.punch,standard/apache_httpd/enrichment.punch,standard/apache_httpd/normalization.punch,standard/apache_httpd/taxonomy.json,standard/apache_httpd/http_codes.json

Let 's see how this script works:

We call the punchplatform-log-injector.sh script, it will do the heavy lifting.
The option -c set the injector file to use for input log generation.
--punchlets is a comma-separated list of the punchlets used in our pipeline

Now, call this script and let see what happened

punchplatform-log-injector.sh  -c $PUNCHPLATFORM_CONF_DIR/resources/injector/mytenant/apache_httpd_injector.json --punchlets standard/common/local_for_test_perf.punch,standard/common/input.punch,standard/common/parsing_syslog_header.punch,standard/apache_httpd/parsing.punch,standard/apache_httpd/enrichment.punch,standard/apache_httpd/normalization.punch,standard/apache_httpd/taxonomy.json,standard/apache_httpd/http_codes.json
registering punchlet: standard/common/local_for_test_perf.punch
registering punchlet: standard/common/input.punch
registering punchlet: standard/common/parsing_syslog_header.punch
registering punchlet: standard/apache_httpd/parsing.punch
registering punchlet: standard/apache_httpd/enrichment.punch
registering punchlet: standard/apache_httpd/normalization.punch
registering punchlet: standard/apache_httpd/taxonomy.json
registering punchlet: standard/apache_httpd/http_codes.json
registering groks from /home/gmfmi/punchplatform/standalone/punch-standalone-5.1.2/conf/resources/punch/patterns
compiling ...
punchlets compiled
running punchlets using infinite loop
running punchlet at maximum throughput
[Tue Feb 19 10:28:50 CET 2019] client.apache_httpd_injector.json0 starts ....
[Tue Feb 19 10:28:52 CET 2019] client.apache_httpd_injector.json0 duration (s): 2     sent-msg : 1065       rate (1/s): 532.5  
[Tue Feb 19 10:28:54 CET 2019] client.apache_httpd_injector.json0 duration (s): 4     sent-msg : 24068      rate (1/s): 11497.0
[Tue Feb 19 10:28:56 CET 2019] client.apache_httpd_injector.json0 duration (s): 6     sent-msg : 55976      rate (1/s): 15944.5
[Tue Feb 19 10:28:58 CET 2019] client.apache_httpd_injector.json0 duration (s): 8     sent-msg : 89516      rate (1/s): 16768.0
[Tue Feb 19 10:29:00 CET 2019] client.apache_httpd_injector.json0 duration (s): 10    sent-msg : 123541     rate (1/s): 17011.0
[Tue Feb 19 10:29:02 CET 2019] client.apache_httpd_injector.json0 duration (s): 12    sent-msg : 158056     rate (1/s): 17247.4
[Tue Feb 19 10:29:04 CET 2019] client.apache_httpd_injector.json0 duration (s): 14    sent-msg : 192624     rate (1/s): 17278.5
[Tue Feb 19 10:29:06 CET 2019] client.apache_httpd_injector.json0 duration (s): 16    sent-msg : 227265     rate (1/s): 17319.5

We are now in an infinite loop which try to process as many log as possible. This way, we can test an end-to-end punchlets pipeline performance. To get the best result, a good practice is to let the script run at least for 5 minutes (as a warm-up) and then keep the rate as your real result.

Important

This benchmark heavily depends on your testing platform setup. This benchmark tool is not made to give absolute/reference rate but to compare new parser behaviour with others. It can be helpful to measure a parser performance after a refactoring or an improvement.

Going to production¶

Before going to production, you must write unit tests. Several alternatives are available:

punchplatform-log-injector.sh
punchlinectl

Test with a punchplatform-log-injector¶

1) Write a log injector configuration file

{
  "destination" : { "proto" : "tcp", "host" : "127.0.0.1", "port" : 9901 },
  "load" :{
    "total_messages" : 1,
    "stats_publish_interval" : "1s",
    "message_throughput" : 1
  },
  "message" : {
    "payloads" : [
      "foo bar"
    ],
    "fields" : {
    }
  }
}

2) Write your parser

{
    [logs][log] = [logs][raw_log];
    dissect("%{?a} %{&a}").on([logs][log]).into([dissect]);
}

3) Tests

punchplatform-log-injector.sh -c injector_conf.json --punchlets parser_to_test.punch

The Punchlets must be placed under resource/punch

Test with punchplatform-topology¶

1) Write a test topology

{
  "dag":[ 
    {
      "type": "generator_input",
      "settings": {
        "messages": [
          "{\"type\":\"fw\",\"obs\":{\"ts\":\"2018-11-12T23:56:03.000Z\"},\"init\":{\"host\":{\"ip\":\"127.0.0.1\"}},\"target\":{\"host\":{\"port\":1}}}"
        ]
      },
      "storm_settings": {
        "component": "generator"
      }
    },
    {
      "type": "punchlet_node",
      "settings": {
        "punchlet": "./your_punchlet.punch"
      },
      "storm_settings": {
        "component": "punchlet_node",
        "subscribe": [
          {
            "component": "generator"
          }
        ]
      }
    },
    {
      "type": "punchlet_node",
      "bolts_settings": {
        "punchlet_code": "{print(root);}"
      },
      "storm_settings": {
        "component": "punchlet_node",
        "subscribe": [
          {
            "component": "generator"
          }
        ]
      }
    }
  ]
}

2) Run the topology

punchlinectl parsing_test_topology.json

After checking that every unit test is okay, setup a channel in your standalone that can handle your real logs (a scheme should do the job, see the examples in your Standalone) and try to send some of them into it. Check in Kibana if all you logs are as you thought, and then you can put your punchlets in the real flow.

Contributing¶

The PunchPlatform needs you just as you need the PunchPlatform. Your work is extremely valuable, having a standard base of punchlet parsers is our best asset.

If you use a standard log parser, you have the insurance of a support in case of problem. So if you manage to make your parser standard (or update), you are helped by experts;
If you are in need of a new parser, check first the standard_log_parsers chapter. Who knows if a teammate has already done it? By contributing, you participate to this virtuous circle.

Make development unit test¶

If you package your punchlet using punch compatible archives, the puncher lets you automatically execute all or some of your unit or sample tests. Simply execute it with the '-T' option as follows

punchplatform-puncher.sh -T path to your repository

Using this mode the puncher will scan your folders and look for punchlet group manifest files. Here is an example:

---
spec:
  punchlets:
    - punchlet: parser.punch
      resources:
      - resources/color_codes.json
      groks:
      - groks/pattern.grok
      inputStream:
      - name: logs
        fields:
        - name: data
          type: string
    - punchlet: enrich.punch
      inputStream:
      - name: logs
        fields:
        - name: data
          type: string

You can define two types of tests: unit tests to test precisely a sinle or a chain of punchlets, or (so-called) sample test where you only ingest a file of sample logs and checks the chain of punchlets defined in your manifest file report no error when processing your samples.

Sample test(s) will automatically use the 'inputStream' specification of your first punchlet to determine what is the expected input single stream and field format expected by that first punchlet.

To play a single test file, simply type in the path to that test file. This works for both. unit and sample test files.

punchplatform-puncher.sh -T ./src/com/mycompany/sample/test/unit_chain.json

Here is an example unit test

{
  "input": {
    "message": "168.168.168.168 - - [18/Aug/2011:06:00:14 -0700] \"GET /style2.css HTTP/1.1\" 200 659433 \"http://www.semicomplete.com/blog/geekery/bypassing-captive-portals.html\" \"Mozilla/5.0 (iPad; U; CPU OS 4_3_5 like Mac OS X; en-us) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8L1 Safari/6533.18.5\""
  },
  "output": {
    "includes": {
      "app": {
        "method": "GET",
        "return": {
          "code": "200"
        }
      },
      "init": {
        "host": {
          "ip": "168.168.168.168"
        },
        "process": {
          "name": "Mozilla/5.0 (iPad; U; CPU OS 4_3_5 like Mac OS X; en-us) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8L1 Safari/6533.18.5"
        }
      },
      "obs": {
        "ts": "2011-08-18T15:00:14.000+02:00"
      },
      "session": {
        "out": {
          "byte": 659433
        }
      },
      "target": {
        "uri": {
          "urn": "/style2.css"
        }
      },
      "type": "web"
    }
  }
}