public class GrokOperator extends Object implements IGrokOperator
The syntax of the grok operator is the following :
grok("your grok expression").on("the input value")
A grok expression is made of subexpressions using the following format:
%{PATTERN:OPTIONAL_DESTINATION}
Where PATTERN is a grok predefined pattern. The principle of a grok pattern is to let you define the name of the field where the matches (if any) must be stored. Using punch, these fields will be automatically created in a target tuple. If you specify no destination, the root tuple will be assumed.
An example will make all this much clearer. Here is a grok pattern that will parse a syslog timestamp:
SYSLOGTIMESTAMP %{MONTH:month} +%{MONTHDAY:day} %{TIME:time}
It is made of three subpatterns:
MONTHDAY (?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9])
MONTH \b(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|
Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\b
TIME (?!<[0-9])%{HOUR}:%{MINUTE}(?::%{SECOND})(?![0-9])
If you write
grok("%{SYSLOGTIMESTAMP}").on("Jan 21 12:10:39");
There will be a match, and you will end up with your root tuple filled like
this:
{
"month" : "Jan",
"day" : "21",
"time" : "12:10:39"
}
Instead of filling the (implicit) root tuple you can send all the matches elsewhere, i.e.:
grok("%{SYSLOGTIMESTAMP:[mytimestamp]}").on("Jan 21 12:10:39");
produces
{
"mytimestamp" : {
"month" : "Jan",
"day" : "21",
"time" : "12:10:39"
}
}
You can also send the result to a local Tuple. This is handy to not alter the root tuple of your
punchlet.
Tuple tmp;
grok("%{SYSLOGTIMESTAMP:tmp:[mytimestamp]}").on("Jan 21 12:10:39");
// the matches are stored in a local variable. You can
//
root:[month] = tmp:[mytimestamp][month];
...
Note that the time property has not been decomposed in hour, minute second, because the TIME pattern use anonymous grok sub patterns. If instead of:
TIME (?!<[0-9])%{HOUR}:%{MINUTE}(?::%{SECOND})(?![0-9])
you redefine it as
TIME (?!<[0-9])%{HOUR:hour}:%{MINUTE:minute}(?::%{SECOND:second})(?![0-9])
Then the same expression will produce :
{
"month" : "Jan" ,
"day" : "21" ,
"time" : "12:10:39",
"hour" : "12",
"minute" : "10",
"second" : "39"
}
That is, you specify what part must be stored in what (json) property using
named or anonymous sub grok patterns. The same holds for the top level
expression. If you write
grok("%{SYSLOGTIMESTAMP:/timestamp}").on("Jan 21 12:10:39");
and redefine all grok patterns to be anonymous (i.e. you suppress all lower
case names in sub the MONT MONTHDAY HOUR etc .. patterns), then you'll get :
{ "timestamp" : "Jan 21 12:10:39" }
But if you have some (at least one) named sub pattern then you'll get:
{ "timestamp" : { "month" : "Jan" , "day" : "21" , "time" : "12:10:39" } }
You can use origunami syntax in case you do not find a ready to use grok pattern.
String input = "AAA000DDDD";
grok("(?<queue_id>[0-9A-F]{10,11})").on(input);
produces
{ "queue_id" : "AAA000DDDD" }
You can use "[" and "]" delimiters to send the result to a nested field.
String input = "AAA000DDDD";
grok("(?<[postfix][queue_id]>[0-9A-F]{10,11})").on(input);
produces
{ "postfix" : { "queue_id" : "AAA000DDDD" }}
You can send matches to dynamic tuple fields. Consider the following example :
Tuple t;
t:[num1] = "number1";
Tuple result;
grok(\"%{NUMBER:output:[%{t:[num1]}]}\").on("17");
print(result);
Generates:
{
"number1" : 17
}
This can be nested arbitrarily:
Tuple t1;
t1:[user] = "bob";
Tuple t2;
t2:[age] = "age";
Tuple result;
grok(\"%{NUMBER:output:[%{t1:[user]}][%{t2:[age]}]}\").on("17");
print(result);
It produces the tuple "result":
{
"bob" : {
"age" : 17
}
}
grok("%{WORD:[word]} %{GREEDYDATA}", "%{IP:[ip]}", "%{HOST:[host]}")
.breakOnMatch(false)
.on(\"55.3.244.1\");
produces:
{
"ip" : "55.3.244.1"
}
You can request it to run all patterns as follows:
grok("%{WORD:[word]} %{GREEDYDATA}", "%{IP:[ip]}", "%{HOST:[host]}")
.breakOnMatch(false).on(\"55.3.244.1\");
it then produces:
{
"ip" : "55.3.244.1",
"host" : "55.3.244.1"
}
By default grok considers all matches as string. You can have a grok operator
guess the types using the inferTypes()
action. For example:
grok("%{SYSLOGTIMESTAMP}").inferTypes().on("Jan 21 12:10:39");
Will produce: (note the 21 which is a number not a string)
{ "month" : "Jan" , "day" : 21 , "time" : "12:10:39" }
Constructor and Description |
---|
GrokOperator()
Default ctor.
|
GrokOperator(io.krakens.grok.api.Grok grok,
Tuple[] roots,
TupleFetcher[] fetchers)
This constructor is not meant to be called by applications.
|
Modifier and Type | Method and Description |
---|---|
GrokOperator |
breakOnMatch(boolean b) |
GrokOperator |
evaluate()
Make the operator work in evaluation mode.
|
GrokOperator |
inferTypes()
Activate type inference.
|
boolean |
on(String input)
Fire the grok matching on a String Depending on your grok pattern, matches will be dispatched to your
destination tuples.
|
boolean |
on(Tuple input)
Fire the grok matching.
|
public GrokOperator(io.krakens.grok.api.Grok grok, Tuple[] roots, TupleFetcher[] fetchers)
grok
- the precompiled grok expressionroots
- the destination tuplesfetchers
- the destination tuple fetcherpublic GrokOperator()
public GrokOperator inferTypes()
inferTypes
in interface IGrokOperator
public GrokOperator evaluate()
This helps checking if a log match a grok pattern without the burden to parse and remove a header part.
Of course the guess mode is both inefficient and not 100% trustable. Besides it will not put any match in your destination tuple(s). Only use it to know if part of your log match a pattern or not.
public boolean on(Tuple input)
on
in interface IGrokOperator
input
- the input Tuplepublic boolean on(String input)
on
in interface IGrokOperator
input
- the input tuplepublic GrokOperator breakOnMatch(boolean b)
breakOnMatch
in interface IGrokOperator
Copyright © 2023. All rights reserved.