Skip to content

HOWTO filter out logs

Why do that

In a big data context, the capacity to filter out logs is crucial to optimize storage consuption.

According to the system (specialized in transport or log management) several solutions are available

Transport - LTR system

On LTR systems, the PunchPlatform transports logs with performance and resiliency but without parsing capabalities. Because there is no parser, the PunchPlatform cannot access to a specific field in the log and it is more complex to drop logs.

That's why we provide a efficient component to drop logs with a specific pattern : the Apache Storm filter bolt

For instance, the data engineer can drop all audit logs searching the pattern on all logs and filter out them.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
{
   "type" : "filter_bolt",
   "bolt_settings" : {
     "exclude_substring" : {
       "field" : "log",
       "substrings" : ["AUDIT"],
       "case_sentitive" : true
     }
   },
   "storm_settings" : {
     "component" : "filter_bolt"
   }
 }

Log Management - LMC system

On LMC systems, the PunchPlatform processes the data. All fields are available and can be used intelligently to filter logs.

The punch bolt is the best component to manage filtering capabilities.

For instance:

1
2
3
4
5
6
7
8
9
{
   "type" : "punch_bolt",
   "bolt_settings" : {
      "punchlet" : ["custom/filtering.punch"]
   },
   "storm_settings" : {
     "component" : "punch_bolt_filtering"
   }
}

it is not recommanded to start a filter bolt in a datamanagement platform !

Punch Bolt

Because we think there is too many ways to filter logs, the Punch bolt is the most powerfull tool to filter logs.

Data selection use cases:

Remove an event when an IP match a network mask:

enrichment_IP.json:

1
[ "10.0.0.0/8", "172.16.0.0/12", "192.168.0.0/16", "127.0.0.1/32" ]

custom/filtering.punch:

1
2
3
4
5
6
7
8
{
  Tuple document = [logs][log];
  Tuple internet_ip = getResourceTuple("enrichment_IP");

  if (ipmatch(internet_ip).contains(document:[init][host][ip])) {
    root.empty();
  }
}

Remove an event when a content match a string (exact match):

custom/filtering.punch:

1
2
3
4
5
{
  if ([logs][log][message].containsOneOf("LOIC", "DIMITRI", "CLAIRE")) {
    root.empty();
  }
}

Remove an event when a content match a regex:

custom/filtering.punch:

1
2
3
4
5
{
  if ([logs][log][message].matches("\S+FORBIDDEN$")) {
    root.empty();
  }
}

Check the performance if you include a complex pattern. you MUST know that executing a regex takes about 100 times more CPU than a contains. Do not uses regexes on high traffic.

Performance test

Use the log injector to compare alternatives :

1
punchplatform-log-injector.sh -c <configuration_file>.json --punchlets pre_processing.punch,custom/filtering.punch