Tutorial Write a Production Grade Parser¶

You succeeded in writing a first decent parser. The next step is to make sure you are inline with developers best practices, in order to maintain and publish your code to the community.

After reading this chapter, you will be able to submit to PunchPlatform a standard_log_parsers.

Make it modular¶

The punchlet we have so far is compact. But it does several different things you are likely to repeat for every parser:

extract the timestamp from the header
store the original log in a field for later archiving
parse vendor specific fields
normalize
cleanup
etc ...

You must know that copy and paste a piece of code to produce another one is NOT a good idea. The day you decide to change something in one parser you will have to repeat it in all parsers. It will never work.

Instead we will split our punchlet into three. The first one takes care of the [ [message ]] issue, plus a few more additional goodies: put an input timestamp to keep track of the arrival date of the log in the PunchPlatform, add a unique id, etc.. That punchlet in fact already exists, you simply need to reuse it. It is located in :

$PUNCHPLATFORM_CONF_DIR/resource/punch/standard/common/input.punch

Next we take care of the timestamp header. Again a punchlet already exists :

$PUNCHPLATFORM_CONF_DIR/resource/punch/standard/common/parsing_syslog_header.punch

Last we put the rest (the parsing part) in a parsing punchlet:

$PUNCHPLATFORM_CONF_DIR/resource/punch/standard/arkoon_ngfw/parsing.punch

Instead of one (punchlet) function, we now have three. We simply need to chain these in the log flow.

Conclusion: best is to separate the of your log from its headers, one per punchlet, so it 's reusable.

Manage Parsing error¶

Think about parsing error !

Each grok, csv, dissect or kv must be protected from parsing error with:

{
    throw new PunchRuntimeException("ADAPT YOUR MESSAGE")
}

Benchmarking¶

Is your punchlet efficient ? The only way to get this answer is by test its efficiency with a benchmark. To do so, we will rely on the punchplatform-log-injector.sh utility.

By convention, we write test performance in $PUNCHPLATFORM_CONF_DIR/resources/injector/<tenant>/perf.

To have a reproducible test, we will create a short sh. To test the standard apache_httpd parser, we will setup a test using these 2 steps.

1) Write a local_parser that copy [logs][raw_log] into [logs][log]. To do so, copy and paste this code into a new file called /resources/punch/standard/common/local_for_test_perf.punch

{
    [logs][log] = [logs][raw_log];
}

2) Test the parser performance using this command:

punchplatform-log-injector.sh -c $PUNCHPLATFORM_CONF_DIR/resources/injector/mytenant/apache_httpd_injector.json --punchlets standard/common/local_for_test_perf.punch,standard/common/input.punch,standard/common/parsing_syslog_header.punch,standard/apache_httpd/parsing.punch,standard/apache_httpd/enrichment.punch,standard/apache_httpd/normalization.punch,standard/apache_httpd/taxonomy.json,standard/apache_httpd/http_codes.json

Let 's see how this script works:

We call the punchplatform-log-injector.sh script, it will do the heavy lifting.
The option -c set the injector file to use for input log generation.
--punchlets is a comma-separated list of the punchlets used in our pipeline

Now, call this script and let see what happened

punchplatform-log-injector.sh  -c $PUNCHPLATFORM_CONF_DIR/resources/injector/mytenant/apache_httpd_injector.json --punchlets standard/common/local_for_test_perf.punch,standard/common/input.punch,standard/common/parsing_syslog_header.punch,standard/apache_httpd/parsing.punch,standard/apache_httpd/enrichment.punch,standard/apache_httpd/normalization.punch,standard/apache_httpd/taxonomy.json,standard/apache_httpd/http_codes.json
registering punchlet: standard/common/local_for_test_perf.punch
registering punchlet: standard/common/input.punch
registering punchlet: standard/common/parsing_syslog_header.punch
registering punchlet: standard/apache_httpd/parsing.punch
registering punchlet: standard/apache_httpd/enrichment.punch
registering punchlet: standard/apache_httpd/normalization.punch
registering punchlet: standard/apache_httpd/taxonomy.json
registering punchlet: standard/apache_httpd/http_codes.json
registering groks from /home/gmfmi/punchplatform/standalone/punch-standalone-5.1.2/conf/resources/punch/patterns
compiling ...
punchlets compiled
running punchlets using infinite loop
running punchlet at maximum throughput
[Tue Feb 19 10:28:50 CET 2019] client.apache_httpd_injector.json0 starts ....
[Tue Feb 19 10:28:52 CET 2019] client.apache_httpd_injector.json0 duration (s): 2     sent-msg : 1065       rate (1/s): 532.5  
[Tue Feb 19 10:28:54 CET 2019] client.apache_httpd_injector.json0 duration (s): 4     sent-msg : 24068      rate (1/s): 11497.0
[Tue Feb 19 10:28:56 CET 2019] client.apache_httpd_injector.json0 duration (s): 6     sent-msg : 55976      rate (1/s): 15944.5
[Tue Feb 19 10:28:58 CET 2019] client.apache_httpd_injector.json0 duration (s): 8     sent-msg : 89516      rate (1/s): 16768.0
[Tue Feb 19 10:29:00 CET 2019] client.apache_httpd_injector.json0 duration (s): 10    sent-msg : 123541     rate (1/s): 17011.0
[Tue Feb 19 10:29:02 CET 2019] client.apache_httpd_injector.json0 duration (s): 12    sent-msg : 158056     rate (1/s): 17247.4
[Tue Feb 19 10:29:04 CET 2019] client.apache_httpd_injector.json0 duration (s): 14    sent-msg : 192624     rate (1/s): 17278.5
[Tue Feb 19 10:29:06 CET 2019] client.apache_httpd_injector.json0 duration (s): 16    sent-msg : 227265     rate (1/s): 17319.5

We are now in an infinite loop which try to process as many log as possible. This way, we can test an end-to-end punchlets pipeline performance. To get the best result, a good practice is to let the script run at least for 5 minutes (as a warm-up) and then keep the rate as your real result.

Important

This benchmark heavily depends on your testing platform setup. This benchmark tool is not made to give absolute/reference rate but to compare new parser behaviour with others. It can be helpful to measure a parser performance after a refactoring or an improvement.

Going to production¶

Before going to production, you must write unit tests. Several alternatives are available:

punchplatform-log-injector.sh
punchlinectl

Test with a punchplatform-log-injector¶

1) Write a log injector configuration file

{
  "destination" : { "proto" : "tcp", "host" : "127.0.0.1", "port" : 9901 },
  "load" :{
    "total_messages" : 1,
    "stats_publish_interval" : "1s",
    "message_throughput" : 1
  },
  "message" : {
    "payloads" : [
      "foo bar"
    ],
    "fields" : {
    }
  }
}

2) Write your parser

{
    [logs][log] = [logs][raw_log];
    dissect("%{?a} %{&a}").on([logs][log]).into([dissect]);
}

3) Tests

punchplatform-log-injector.sh -c injector_conf.json --punchlets parser_to_test.punch

The Punchlets must be placed under resource/punch

Test with punchplatform-topology¶

1) Write a test topology

{
  "dag":[ 
    {
      "type": "generator_input",
      "settings": {
        "messages": [
          "{\"type\":\"fw\",\"obs\":{\"ts\":\"2018-11-12T23:56:03.000Z\"},\"init\":{\"host\":{\"ip\":\"127.0.0.1\"}},\"target\":{\"host\":{\"port\":1}}}"
        ]
      },
      "storm_settings": {
        "component": "generator"
      }
    },
    {
      "type": "punchlet_node",
      "settings": {
        "punchlet": "./your_punchlet.punch"
      },
      "storm_settings": {
        "component": punchlet_node",
        "subscribe": [
          {
            "component": "generator"
          }
        ]
      }
    },
    {
      "type": "punchlet_node",
      "bolts_settings": {
        "punchlet_code": "{print(root);}"
      },
      "storm_settings": {
        "component": punchlet_node",
        "subscribe": [
          {
            "component": "generator"
          }
        ]
      }
    }
  ]
}

2) Run the topology

punchlinectl parsing_test_topology.json

After checking that every unit test is okay, setup a channel in your standalone that can handle your real logs (a scheme should do the job, see the examples in your Standalone) and try to send some of them into it. Check in Kibana if all you logs are as you thought, and then you can put your punchlets in the real flow.

Contributing¶

The PunchPlatform needs you just as you need the PunchPlatform. Your work is extremely valuable, having a standard base of punchlet parsers is our best asset.

If you use a standard log parser, you have the insurance of a support in case of problem. So if you manage to make your parser standard (or update), you are helped by experts;
If you are in need of a new parser, check first the standard_log_parsers chapter. Who knows if a teammate has already done it? By contributing, you participate to this virtuous circle.

Make development unit test¶

All standard parsers are located in the pp-resources repository

And then :

cd standard-resources

1) Create your parser

ls resources/punch/standard/apache_httpd/
-rw-r--r--  1 loicjardin  staff    216 Nov 22 08:46 enrichment.punch
-rw-r--r--  1 loicjardin  staff    306 Nov 22 08:46 enrichment_useragents.punch
-rw-r--r--  1 loicjardin  staff   1624 Nov 22 08:46 http_codes.json
-rw-r--r--  1 loicjardin  staff    509 Nov 22 08:46 normalization.punch
-rw-r--r--  1 loicjardin  staff   2796 Nov 22 08:46 parsing.punch
-rw-r--r--  1 loicjardin  staff  27396 Nov 22 08:46 taxonomy.json

2) Create dev unit resources

ls -l src/test/resources/standard/apache_httpd/
total 24
-rw-r--r--  1 loicjardin  staff   441 Nov 22 08:46 sample_1.txt
-rw-r--r--  1 loicjardin  staff  1027 Nov 22 08:46 unit_1.json
-rw-r--r--  1 loicjardin  staff   595 Nov 22 08:46 unit_2.json

For example:

cat src/test/resources/standard/apache_httpd/unit_1.json
{
  "input": {
    "message": "168.168.168.168 - - [18/Aug/2011:06:00:14 -0700] \"GET /style2.css HTTP/1.1\" 200 659433 \"http://www.semicomplete.com/blog/geekery/bypassing-captive-portals.html\" \"Mozilla/5.0 (iPad; U; CPU OS 4_3_5 like Mac OS X; en-us) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8L1 Safari/6533.18.5\""
  },
  "output": {
    "includes": {
      "app": {
        "method": "GET",
        "return": {
          "code": "200"
        }
      },
      "init": {
        "host": {
          "ip": "168.168.168.168"
        },
        "process": {
          "name": "Mozilla/5.0 (iPad; U; CPU OS 4_3_5 like Mac OS X; en-us) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8L1 Safari/6533.18.5"
        }
      },
      "obs": {
        "ts": "2011-08-18T15:00:14.000+02:00"
      },
      "session": {
        "out": {
          "byte": 659433
        }
      },
      "target": {
        "uri": {
          "urn": "/style2.css"
        }
      },
      "type": "web"
    }
  }
}

3) Create the test

cat src/test/java/org/thales/punch/parsers/test/ApacheHttpdTest.java
package org.thales.punch.parsers.test;

import java.net.URISyntaxException;

import com.thales.services.cloudomc.punchplatform.punch.api.test.PlayBook;

import org.testng.Assert;
import org.testng.annotations.*;


@SuppressWarnings("javadoc")
@Test
public class ApacheHttpdTest
{
    PlayBook playbook;

    @BeforeClass(alwaysRun = true)
    public void setup() throws URISyntaxException {
        playbook = new PlayBook(ApacheHttpdTest.class)
        .setBreakPoint(new PlayBook.BreakPoint() {
            public void breakpoint() {
                System.out.println("put a breakpoint here if you have errors");
            }
        });
        playbook.addGrokPatterns("punch/patterns")
        .setInputTupleField("[logs][data]")
        .addResources(
            "punch/standard/apache_httpd/parsing.punch"
        );
    }

    @AfterClass(alwaysRun = true)
    public void close() {
        playbook.close();
    }

    public void sample_1() throws Exception
    {
        Assert.assertTrue(
            playbook.playSampleLogs("standard/apache_httpd/sample_1.txt")
        );
    }

    public void unit_1() throws Exception
    {
        Assert.assertTrue(
            playbook.playUnitTest("standard/apache_httpd/unit_1.json")
        );
    }

    public void unit_2() throws Exception
    {
        Assert.assertTrue(
            playbook.playUnitTest("standard/apache_httpd/unit_2.json")
        );
    }
}

4) Run the test

mvn test -Dtest=ApacheHttpdTest