GrokOperator (Punch Punch Language Runtime Library 6.4.5-SNAPSHOT API)

java.lang.Object
- org.thales.punch.libraries.punchlang.operator.GrokOperator

All Implemented Interfaces:: IGrokOperator

Direct Known Subclasses:: GroksOperator

public class GrokOperator
extends Object
implements IGrokOperator

The GrokOperator is the one invoked for matching grok patterns. It allows you to match substrings from input string and store the sub matched directly into an output Tuple.

Basics

The syntax of the grok operator is the following :


 grok("your grok expression").on("the input value")

A grok expression is made of subexpressions using the following format:


 %{PATTERN:OPTIONAL_DESTINATION}

Where PATTERN is a grok predefined pattern. The principle of a grok pattern is to let you define the name of the field where the matches (if any) must be stored. Using punch, these fields will be automatically created in a target tuple. If you specify no destination, the root tuple will be assumed.

An example will make all this much clearer. Here is a grok pattern that will parse a syslog timestamp:


 SYSLOGTIMESTAMP %{MONTH:month} +%{MONTHDAY:day} %{TIME:time}

It is made of three subpatterns:


 MONTHDAY (?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9])
 MONTH \b(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|
 Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\b
 TIME (?!<[0-9])%{HOUR}:%{MINUTE}(?::%{SECOND})(?![0-9])

If you write


 grok("%{SYSLOGTIMESTAMP}").on("Jan 21 12:10:39");

There will be a match, and you will end up with your root tuple filled like this:


  {
    "month" : "Jan",
    "day" : "21",
    "time" : "12:10:39"
  }

Instead of filling the (implicit) root tuple you can send all the matches elsewhere, i.e.:


 grok("%{SYSLOGTIMESTAMP:[mytimestamp]}").on("Jan 21 12:10:39");

produces


  {
    "mytimestamp" : {
      "month" : "Jan",
      "day" : "21",
      "time" : "12:10:39"
    }
  }

You can also send the result to a local Tuple. This is handy to not alter the root tuple of your punchlet.


 Tuple tmp;
 grok("%{SYSLOGTIMESTAMP:tmp:[mytimestamp]}").on("Jan 21 12:10:39");
 // the matches are stored in a local variable. You can
 //
 root:[month] = tmp:[mytimestamp][month];
 ...

Note that the time property has not been decomposed in hour, minute second, because the TIME pattern use anonymous grok sub patterns. If instead of:


 TIME (?!<[0-9])%{HOUR}:%{MINUTE}(?::%{SECOND})(?![0-9])

you redefine it as


 TIME (?!<[0-9])%{HOUR:hour}:%{MINUTE:minute}(?::%{SECOND:second})(?![0-9])

Then the same expression will produce :


 {
   "month" : "Jan" ,
   "day" : "21" ,
   "time" : "12:10:39",
   "hour" : "12",
   "minute" : "10",
   "second" : "39"
 }

That is, you specify what part must be stored in what (json) property using named or anonymous sub grok patterns. The same holds for the top level expression. If you write


 grok("%{SYSLOGTIMESTAMP:/timestamp}").on("Jan 21 12:10:39");

and redefine all grok patterns to be anonymous (i.e. you suppress all lower case names in sub the MONT MONTHDAY HOUR etc .. patterns), then you'll get :


 { "timestamp" : "Jan 21 12:10:39" }

But if you have some (at least one) named sub pattern then you'll get:


 { "timestamp" : { "month" : "Jan" , "day" : "21" , "time" : "12:10:39" } }

Using Origunami Patterns

You can use origunami syntax in case you do not find a ready to use grok pattern.


 String input = "AAA000DDDD";
 grok("(?<queue_id>[0-9A-F]{10,11})").on(input);

produces


 { "queue_id" : "AAA000DDDD" }

You can use "[" and "]" delimiters to send the result to a nested field.


 String input = "AAA000DDDD";
 grok("(?<[postfix][queue_id]>[0-9A-F]{10,11})").on(input);

produces


 { "postfix" : { "queue_id" : "AAA000DDDD" }}

Dynamic Output Fields

You can send matches to dynamic tuple fields. Consider the following example :


  Tuple t;
  t:[num1] = "number1";
        Tuple result;
        grok(\"%{NUMBER:output:[%{t:[num1]}]}\").on("17");
        print(result);

Generates:


 {
   "number1" : 17
 }

This can be nested arbitrarily:


        Tuple t1;
        t1:[user] = "bob";
        Tuple t2;
        t2:[age] = "age";
        Tuple result;
        grok(\"%{NUMBER:output:[%{t1:[user]}][%{t2:[age]}]}\").on("17");
        print(result);

It produces the tuple "result":


    {
         "bob" : {
           "age" : 17
      }
    }

Using Multiple Patterns

The grok operator accepts multiple patterns as argument. It will then run each one in turn and stops at the first match. For example:


                grok("%{WORD:[word]} %{GREEDYDATA}", "%{IP:[ip]}", "%{HOST:[host]}")
                .breakOnMatch(false)
                .on(\"55.3.244.1\");

produces:


    {
    "ip" : "55.3.244.1"
    }

You can request it to run all patterns as follows:


  grok("%{WORD:[word]} %{GREEDYDATA}", "%{IP:[ip]}", "%{HOST:[host]}")
    .breakOnMatch(false).on(\"55.3.244.1\");

it then produces:


    {
         "ip" : "55.3.244.1",
         "host" : "55.3.244.1"
    }

Type inference

By default grok considers all matches as string. You can have a grok operator guess the types using the inferTypes() action. For example:


  grok("%{SYSLOGTIMESTAMP}").inferTypes().on("Jan 21 12:10:39");

Will produce: (note the 21 which is a number not a string)


  { "month" : "Jan" , "day" : 21 , "time" : "12:10:39" }

Constructor Summary

Constructors
Constructor and Description
`GrokOperator()` Default ctor.
`GrokOperator(io.krakens.grok.api.Grok grok, Tuple[] roots, TupleFetcher[] fetchers)` This constructor is not meant to be called by applications.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`GrokOperator`	`breakOnMatch(boolean b)`
`GrokOperator`	`evaluate()` Make the operator work in evaluation mode.
`GrokOperator`	`inferTypes()` Activate type inference.
`boolean`	`on(String input)` Fire the grok matching on a String Depending on your grok pattern, matches will be dispatched to your destination tuples.
`boolean`	`on(Tuple input)` Fire the grok matching.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - GrokOperator
```
public GrokOperator(io.krakens.grok.api.Grok grok,
                    Tuple[] roots,
                    TupleFetcher[] fetchers)
```
    This constructor is not meant to be called by applications. It is
    
    Parameters:
    
    grok - the precompiled grok expression
    
    roots - the destination tuples
    
    fetchers - the destination tuple fetcher
  - GrokOperator
```
public GrokOperator()
```
    Default ctor.
- Method Detail
  - inferTypes
```
public GrokOperator inferTypes()
```
    Activate type inference. Instead of generating Strings for all matches, the generated values will be booleans, longs, doubles or Strings.
    
    Specified by:
    
    inferTypes in interface IGrokOperator
    
    Returns:
    
    this operator
  - evaluate
```
public GrokOperator evaluate()
```
    Make the operator work in evaluation mode. The guess mode consist in trying thr grok expression on all the substring of your input log, from left to right.
    This helps checking if a log match a grok pattern without the burden to parse and remove a header part.
    Of course the guess mode is both inefficient and not 100% trustable. Besides it will not put any match in your destination tuple(s). Only use it to know if part of your log match a pattern or not.
    
    Returns:
    
    this operator
  - on
```
public boolean on(Tuple input)
```
    Fire the grok matching. Depending on your grok pattern, matches will be dispatched to your destination tuples.
    
    Specified by:
    
    on in interface IGrokOperator
    
    Parameters:
    
    input - the input Tuple
    
    Returns:
    
    true if there was some match
  - on
```
public boolean on(String input)
```
    Fire the grok matching on a String Depending on your grok pattern, matches will be dispatched to your destination tuples.
    
    Specified by:
    
    on in interface IGrokOperator
    
    Parameters:
    
    input - the input tuple
    
    Returns:
    
    true if there was some match
  - breakOnMatch
```
public GrokOperator breakOnMatch(boolean b)
```
    Specified by:
    
    breakOnMatch in interface IGrokOperator

Class GrokOperator

Basics

Using Origunami Patterns

Dynamic Output Fields

Using Multiple Patterns

Type inference

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

GrokOperator

GrokOperator

Method Detail

inferTypes

evaluate

on

on

breakOnMatch