Skip to content

HOWTO extract logs from elasticsearch with logger

Why do that

To extract a small number of logs, for instance the result of an investigation, the punchplatform kibana plugin is better. The following method is to extract a large amount of logs on a continuous period of time with a simple query.

If you have access to PML, you should use it instead.

What to do

Select your data

To extract data from elasticsearch, we first build a topology that prints the result of the query.

{
  "tenant": "mytenant",
  "channel": "extractor",
  "name": "extractor",
  "dag": [
    {
      "type": "extraction_input",
      "settings": {
        "index": "*metricbeat*/doc",
        "query": "?q=beat.version:6*"
      },
      "storm_settings": {
        "executors": 1,
        "component": "extractor_spout",
        "publish": [
          {
            "stream": "default",
            "fields": [
              "doc"
            ]
          }
        ]
      }
    }
  ],
  "bolts": [
    {
      "type": "punchlet_node",
      "settings": {
        "punchlet_code": "{print(root:[default][doc].toJson());}"
      },
      "storm_settings": {
        "executors": 1,
        "component": punchlet_node",
        "subscribe": [
          {
            "component": "extractor_spout",
            "stream": "default"
          }
        ]
      }
    }
  ]
}

Run the topology with the command:

punchlinectl <your_topology>.json

If you don't see any data on the terminal, recheck the elasticseach_spout settings to ensure that at least one documents is found.

Then, update the punchlet_code field with the following settings:

"punchlet_code" : "{ logger().warn(root:[default][doc].toJson()); }"

Update the logger configuration

We recommend to start the extraction in foreground.

Backup the current log4j2-topology.xml file located in the operator library folder. Depending on the platform installation, it can be found at:

  • standalone: <install_dir>/external/punch-operator-*/bin/log4j2-topology.xml
  • deployed: /data/opt/punch-operator-*/bin/log4j2-topology.xml

Now, update the previous log4j2-topology.xml file with the following settings:

  • fileName: path to store extracted logs
  • filePattern: name and pattern of archives that contains extracted logs.

Here is an example of original file only update with the new parameters. Here, the ... must be replaced with the original file content, we only removed them to be clearer.

<?xml version="1.0" encoding="UTF-8"?>
<Configuration monitorInterval="10" shutdownHook="disable">

    <properties>
        <property name="patternPunchlet">%msg%n</property>
        ...
    </properties>

    <Appenders>
        <RollingFile name="PUNCHLETLOGGER"
                     fileName="${sys:punchplatform.log.dir}/extraction/${sys:logfile.name}.json"
                     filePattern="${sys:punchplatform.log.dir}/extraction/${sys:logfile.name}.json.%i.gz">
            <PatternLayout>
                <pattern>${patternPunchlet}</pattern>
            </PatternLayout>
            <Policies>
                <SizeBasedTriggeringPolicy size="100 MB"/>
            </Policies>
            <DefaultRolloverStrategy max="1000000"/>
        </RollingFile>
        ...
    </Appenders>

    <Loggers>
        <logger name="org.thales.punch.libraries.punchlang.api.Punchlet" level="warn" additivity="false">
            <appender-ref ref="PUNCHLETLOGGER"/>
        </logger>
        ...
    </Loggers>

</Configuration>

Finally, run the extraction

punchlinectl <topology_name>.json

Where are the output file ?

For example, if you use the default parameter set in the log4j2-topology.xml above, you have to find the punchplatform.log.dir. To do so, run this command:

punchplatform-env.sh | grep PUNCHPLATFORM_LOG_DIR

From this folder, you will find your extraction files under the extraction directory. In each files, you finally get one log event per line.