Skip to content

HOWTO extract logs from elasticsearch with logger

Why do that

To extract a small number of logs, for instance the result of an investigation, the punchplatform kibana plugin is better. The following method is to extract a large amount of logs on a continuous period of time with a simple query.

If you have access to PML, you should use it instead.

What to do

Select your data

To extract data from elasticsearch, we first build a topology that prints the result of the query.

Here is an example of the topology where you have to update the following settings:

  • es _cluster _name: Elasticsearch Cluster name (curl :9200)
  • es _cluster _nodes _and _port: Hostname and transport port (usually 9300) of an elasticsearch serveur
  • index _name: Elasticsearch indice pattern (curl :9200/ _cat/indices)
  • from _datetime and to _datetime: select the period
  • timestamp _field: the timestamp use to query the period selected
  • filtering _request: additional query (to select all keep _type:)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
{
  "tenant" : "mytenant",
  "channel" : "extractor",
  "name" : "extractor",
  "spouts" : [
    {
      "type" : "elasticsearch_spout",
      "spout_settings" : {
            "es_cluster_name" : "es_search",
            "es_cluster_nodes_and_ports" : "127.0.0.1:9300",
            "index" : "metricbeat-6.2.2-*",
            "from_datetime" : "2018-01-01T12:00:00.000Z",
            "to_datetime" : "2018-11-11T20:00:00.000Z",
            "timestamp_field" : "@timestamp",
            "query" : "?q=beat.version:6*" 
          },
          "storm_settings" : {
            "executors": 1,
            "component" : "extractor_spout",
            "publish" : [
              {
                "stream" : "default" ,
                "fields" : ["doc"]
              }
            ]
          }
      }
  ],
  "bolts" : [
    {
      "type" : "punch_bolt",
      "bolt_settings" : {
        "punchlet_code" : "{print(root);}"
      },
      "storm_settings" : {
        "executors": 1,
        "component" : "punch_bolt",
        "subscribe" : [
          {
            "component" : "extractor_spout",
            "stream" : "default",
            "grouping": "localOrShuffle"
          }
        ]
      }
    }
  ],
  "storm_settings" : {
      "metrics_consumers": [],
      "topology.builtin.metrics.bucket.size.secs": 30,
      "supervisor.monitor.frequency.secs" : 60,
      "topology.max.spout.pending" : 10000,
      "topology.enable.message.timeouts": true,
      "topology.message.timeout.secs" : 30,
      "topology.worker.childopts": "-server -Xms256m -Xmx256m",
      "topology.receiver.buffer.size": 32,
      "topology.executor.receive.buffer.size": 16384,
      "topology.executor.send.buffer.size": 16384,
      "topology.transfer.buffer.size": 32,
      "topology.worker.shared.thread.pool.size": 4,
      "topology.disruptor.wait.strategy": "com.lmax.disruptor.BlockingWaitStrategy",
      "topology.spout.wait.strategy": "org.apache.storm.spout.SleepSpoutWaitStrategy",
      "topology.sleep.spout.wait.strategy.time.ms": 50,
      "topology.workers" : 1
  }
}

Run the topology with the command:

1
punchplatform-topology.sh --start-foreground -m local -t ./<your_topology>.json

If you don 't see data on the terminal, recheck the previous settings.

If you see logs, update the punchlet _code with the following settings:

1
"punchlet_code" : "{logger().warn(root:[logs][log].toJson());}"

Update the logger configuration

We recommand to start the extraction in foreground.

Backup the current logback-topology.xml file located in /data/opt/punchplatform-admin-node-X.Y.Z/bin/ (or /bin if you are in a standalone)

logback-topology.xml must be update with the following settings:

  • fileName: path to store extracted logs
  • filePattern: name and pattern of archives that contains extracted logs.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
<?xml version="1.0" encoding="UTF-8"?>
<Configuration monitorInterval="10"  shutdownHook="disable">
<properties>
  <property name="patternPunchlet">%msg%n</property>
  <property name="pattern">%d{yyyy-MM-dd HH:mm:ss} %c{9.} [%p] %msg%n</property>
  <property name="patternNoTime">%d{HH:mm:ss} %c{1.} [%p] %msg%n</property>
</properties>
  <Appenders>
    <!-- 
      This file setups the logging output of all punchplatform applications
      Remember these run on every cluster node. Make sure all these nodes do not fill the disk with
      logs, i.e. use rolling file appenders.
    -->  

    <Console name="STDOUT" target="SYSTEM_OUT">
      <!--        
        <ThresholdFilter level="ERROR" onMatch="ACCEPT" onMismatch="DENY"/> 
      -->
      <PatternLayout>
        <pattern>${patternNoTime}</pattern>
      </PatternLayout>
    </Console>

    <RollingFile name="FILE" fileName="${sys:punchplatform.log.dir}/${sys:logfile.name}.log" filePattern="${sys:punchplatform.log.dir}/${sys:logfile.name}.log.%i.gz">
      <PatternLayout>
        <pattern>${pattern}</pattern>
      </PatternLayout>
      <Policies>
        <SizeBasedTriggeringPolicy size="100 MB"/> <!-- Or every 100 MB -->
      </Policies>
      <DefaultRolloverStrategy max="4"/>
    </RollingFile>
    <RollingFile name="PUNCHLETLOGGER" fileName="${sys:punchplatform.log.dir}/extraction/${sys:logfile.name}json" filePattern="${sys:punchplatform.log.dir}/extraction/${sys:logfile.name}json.%i.gz">
      <PatternLayout>
        <pattern>${patternPunchlet}</pattern>
      </PatternLayout>
      <Policies>
        <SizeBasedTriggeringPolicy size="100 MB"/> <!-- Or every 100 MB -->
      </Policies>
      <DefaultRolloverStrategy max="1000000"/>
    </RollingFile>
  </Appenders>  

  <Loggers>

    <!-- the punchplatform info level only produce useful information --> 
    <logger name="org.thales.punch" level="info"/>  

    <!-- legacy will soon completely vanish -->
    <logger name="punchplatform" level="warn"/>  

    <!-- just in case you need it, not likely -->
    <logger name="org.apache.zookeeper" level="WARN"/>
    <logger name="org.apache.curator" level="WARN"/>

    <logger name="org.thales.punch.libraries.punchlang.api.Punchlet" level="warn" additivity="false">
      <appender-ref ref="PUNCHLETLOGGER"/>
    </logger>  

     <root level="info">
        <!-- 
          Use this to see the logs appear in stdout
          But that will affect punachplatform-admin.sh start command and the likes 
          <appender-ref ref="STDOUT"/> 
        -->
        <appender-ref ref="FILE"/> 
    </root>
  </Loggers>  

</Configuration>

Finally, run the extraction

1
$ punchplatform-topology.sh -m local --start-foreground --topology <topology_name>.json