Write your first parser

This chapter requires you to went through the Getting started chapter (at least a quick reading). In this chapter we will see how the punch language makes it easy to write a complete log parser.

Step 1 : Get a log

What you have to do first is to understand the log you deal with. Here is a log. (We took an Arkoon log as example).

Sep 21 21:24:03 fakehost Alerts: AKLOG - id=firewall time="2017-09-05 23:59:56" gmtime=1504648796 pri=2 fw=thsintinfw01p.thales aktype=ALERT alert_type="Blocked by filter" user="bob" alert_level="Medium" alert_desc="UDP from 88.86.115.150:9987 to 192.54.144.31:33719 [default_rule]"

In there you have various parts. You (typically) have a header with some timestamp. It can be a standard syslog header, or something else. In our example it is a syslog header Sep 21 21:24:03 fakehost.

Next you have vendor specific information. Here we have a keyword, Alerts:, the Arkoon vendor tag AKLOG - and keys with values.

Step 2 : Think

With the Punch language you have several operators to extract parts of your log and store each the interesting matches in your JSON document. In a nutshell here is what you have:

  • kv : the key value operator to parse logs with field1=value1 field2=value2 … fieldN=valueN (our case here),
  • csv : the CSV operator to parse logs with value1;value2;…;valueN,
  • date: to convert date from to arbitrary format,
  • syslogHeader : to extract syslog header from standard RFC,
  • grok : the Grok operator gives you a large amount of smart regexes to extract arbitrary pattern,
  • dissect : the dissect operator is a sort of super-splitter, more efficient than the grok operator.
  • many (Java) methods such as String startsWith, endWiths, matches etc ..

Have a look at the existing parsers to see how most usual cases have been dealt with. The grok operator is the “fallback” solution. It works using predefined regular expressions (defined in .grok files), ready to use. It is very powerful, but you pay a performance penalty. Always check if you can get the job done with dissect, kv or csv first.

In our case, you have :

  • the syslog header Sep 21 21:24:03 fakehost;
  • a custom log header Alerts: AKLOG -
  • a KV.

Step 3 : Write the Punchlet

Naming Scheme

Punchlets are named using a well defined scheme. We suggest you stick to that scheme. The punchlet name refers to the role it plays.

Here we will create:

$PUNCHPLATFORM_CONF_DIR/resources/punch/mytenant/arkoon_ngfw/parsing.punch

Where :

  • mytenant is the tenant’s name,
  • arkoon is the producer’s name,
  • ngfw is the technology’s name,
  • parsing is the punchlet’s job.

This scheme is used for all configuration and resource files including grok files, injector files and so on. For example:

$PUNCHPLATFORM_CONF_DIR/resource/punch/patterns/mytenant_arkoon_ngfw.grok
$PUNCHPLATFORM_CONF_DIR/resource/injector/mytenant/arkoon_ngfw_injector.json

When used in a LMC context, your punchlet will receive the input logs under the stream “logs”, field “log”. To you it simply consists in a Json document as illustarted next:

{
   "logs" : {
     "log" : "Sep 21 21:24:03 fakehost Alerts: AK... "
   }
}

Remember a Json document is represented in Punch as a Tuple type. Your punchlet will thus access that log using the punch root:[logs][log] instruction.

Most often, a parsing punchlet is divided into three parts:

  1. Input Check : checks that the input document (i.e. Tuple) is valid.
  2. Syntax analysis : extracts and analyses important fields from your logs.
  3. Field Binding : make sure all parts are stored under the fields ultimately expected by elasticsearch.

Input check

Let us bootstrap our punchlet by retrieving the received log.

// @test Sep 21 21:24:03 fakehost Alerts: AKLOG - id=firewall time="2017-09-05 23:59:56" gmtime=1504648796 pri=2 fw=thsintinfw01p.thales aktype=ALERT alert_type="Blocked by filter" user="" alert_level="Medium" alert_desc="UDP from 88.86.115.150:9987 to 192.54.144.31:33719 [default_rule]"
{
  [logs][log][message] = [logs][log]; // Saving the original message here
}

Hit Ctrl + B in Sublime Text to test this small stub.

Syntax analysis

Let us now cut little by little our log to extract interesting parts. We will use a mix of punch operators and plain java methods.

// @test Sep 21 21:24:03 fakehost Alerts: AKLOG - id=firewall time="2017-09-05 23:59:56" gmtime=1504648796 pri=2 fw=thsintinfw01p.thales aktype=ALERT alert_type="Blocked by filter" user="bob" alert_level="Medium" alert_desc="UDP from 88.86.115.150:9987 to 192.54.144.31:33719 [default_rule]"
{
  [logs][log][message] = [logs][log];

  // We will use local variables. These are handy to store
  // intermediary results. Without the burden to perform any cleanup ath
  // punchlet return.
  Tuple tmp;

  // The syslogHeader operator parse and splits the header part form the rest.
  // Here we choose to put the two parts in tmp:[header] and tmp:[greedy]
  if (!syslogHeader().on([logs][log][message]).into(tmp:[header], tmp:[greedy])) {
      raise("does not start with a syslog header");
  }

  // We want to get rif of the 'Alerts: AKLOG - ' part. Here we use the
  // usual java substring and indexOf String operator.
  // You can apply these to a punch Tuple directly.
  // I.e. writing
  //    tmp:[greedy].asString().indexOf("AKLOG - ") + 8)
  // is equivalent to write
  //     tmp:[greedy].asString().indexOf("AKLOG - ") + 8)
  String greedy = tmp:[greedy].substring(tmp:[greedy].asString().indexOf("AKLOG - ") + 8);

  // last we use the key value operator. It will nicely stores all the submatches
  // under the "kv" dictionary of our tmp Tuple.
  if (!kv().on(greedy).into(tmp:[kv])) {
      raise("not a kv log");
  }

  // For debugging only : get your results
  print(tmp);
}

Hit Ctrl + B to see the result. You should see this to the sublime console.

{
  "header": {
    "host": {
      "name": "fakehost"
    },
    "alarm": {
      "sev": 0,
      "facility": 0
    },
    "ts": "2017-09-21T21:24:03.000+02:00"
  },
  "greedy": "Alerts: AKLOG - id=firewall time=\"2017-09-05 23:59:56\" gmtime=1504648796 pri=2 fw=thsintinfw01p.thales aktype=ALERT alert_type=\"Blocked by filter\" user=\"bob\" alert_level=\"Medium\" alert_desc=\"UDP from 88.86.115.150:9987 to 192.54.144.31:33719 [default_rule]\"",
  "kv": {
    "fw": "thsintinfw01p.thales",
    "id": "firewall",
    "aktype": "ALERT",
    "pri": "2",
    "gmtime": "1504648796",
    "time": "2017-09-05 23:59:56",
    "alert_level": "Medium",
    "user": "bob",
    "alert_type": "Blocked by filter",
    "alert_desc": "UDP from 88.86.115.150:9987 to 192.54.144.31:33719 [default_rule]"
  }
}

Good you have successfully cut your log.

Field binding

What we just got relies on the the Arkoon naming convention (i.e. aktype, pri, alert_level etc). We must normalize the fields. If we succeed in normalising our data, we will be able to query all logs from whatever vendor based on fields having the same semantics, such as “alert level”.

The taxonomy of punchplatform normalisation is documented in the Event Normalization section. Here we will bind the folowing services:

  • “pri” into alarm.severity,
  • “fw” into obs.host.name

Doing that is easy and compact with punch. It looks as follows.

  ...
  if (!kv().on(greedy).into(tmp:[kv])) {
    raise("not a kv log");
  }

  [logs][log][alarm][category]    = tmp:[kv][aktype];
  [logs][log][alarm][severity]    = tmp:[kv][pri];
  [logs][log][obs][host][name]    = tmp:[kv][fw];
  [logs][log][alarm][name]        = tmp:[kv][alert_type];
  [logs][log][init][usr][name]    = tmp:[kv][user];
  [logs][log][alarm][description] = tmp:[kv][alert_desc];
}

Put that at the end of your punchlet, and hit Ctrl + B again. Your punchlet is now completed. Congratulations!

First version

Our “ready-to-go” version of this punchlet is here. We simply removed comments, added some section comments to quickly see input/syntax/field binding parts. We also replaced the [logs][log] by the document: alias. This is all about subjective coding style but based on our experience, it is the best way to keep a clean and easy to maintain punchlet.

// @test Sep 21 21:24:03 fakehost Alerts: AKLOG - id=firewall time="2017-09-05 23:59:56" gmtime=1504648796 pri=2 fw=thsintinfw01p.thales aktype=ALERT alert_type="Blocked by filter" user="bob" alert_level="Medium" alert_desc="UDP from 88.86.115.150:9987 to 192.54.144.31:33719 [default_rule]"
{
    ///////////////////////////////////////////
    //  BLOCK : INPUT CHECK
    ///////////////////////////////////////////

    Tuple document = [logs][log];
    document:[message] = [logs][log];

    ///////////////////////////////////////////
    //  BLOCK : SYNTAX ANALYSIS
    ///////////////////////////////////////////

    Tuple tmp;
    if (!syslogHeader().on(document:[message]).into(tmp:[header], tmp:[greedy])) {
        raise("does not start with a syslog header");
    }
    String greedy = tmp:[greedy].substring(tmp:[greedy].indexOf("AKLOG - ") + 8);

    if (!kv().on(greedy).into(tmp:[kv])) {
        raise("not a kv log");
    }

    ///////////////////////////////////////////
    //  BLOCK : FIELD BINDING
    ///////////////////////////////////////////

    document:[alarm][category]    = tmp:[kv][aktype];
    document:[alarm][severity]    = tmp:[kv][pri];
    document:[obs][host][name]    = tmp:[kv][fw];
    document:[alarm][name]        = tmp:[kv][alert_type];
    document:[init][usr][name]    = tmp:[kv][user];
    document:[alarm][description] = tmp:[kv][alert_desc];
}

You got your Parser. It produces the following normalised document.

{
  "logs": {
    "log": {
      "obs": {
        "host": {
          "name": "thsintinfw01p.thales"
        }
      },
      "init": {
        "usr": {
          "name": "bob"
        }
      },
      "alarm": {
        "severity": "2",
        "name": "Blocked by filter",
        "description": "UDP from 88.86.115.150:9987 to 192.54.144.31:33719 [default_rule]",
        "category": "ALERT"
      },
      "message": "Sep 21 21:24:03 fakehost Alerts: AKLOG - id=firewall time=\"2017-09-05 23:59:56\" gmtime=1504648796 pri=2 fw=thsintinfw01p.thales aktype=ALERT alert_type=\"Blocked by filter\" user=\"bob\" alert_level=\"Medium\" alert_desc=\"UDP from 88.86.115.150:9987 to 192.54.144.31:33719 [default_rule]\""
    }
  }
}

What to do next ?

From there, navigate to one of the following topics

  • To integrate your parser into a channel, see createYourChannel;
  • To improve your parser, and dig further with punch programming, see punchletImprovement;
  • To become a true parser developer, see perserDevelopment.