HOWTO extract logs from elasticsearch with logger

Why do that¶

To extract a small number of logs, for instance the result of an investigation, the punchplatform kibana plugin is better. The following method is to extract a large amount of logs on a continuous period of time with a simple query.

If you have access to PML, you should use it instead.

What to do¶

Select your data¶

To extract data from elasticsearch, we first build a topology that prints the result of the query.

{
  "tenant": "mytenant",
  "channel": "extractor",
  "name": "extractor",
  "dag": [
    {
      "type": "extraction_input",
      "settings": {
        "index": "*metricbeat*/doc",
        "query": "?q=beat.version:6*"
      },
      "storm_settings": {
        "executors": 1,
        "component": "extractor_spout",
        "publish": [
          {
            "stream": "default",
            "fields": [
              "doc"
            ]
          }
        ]
      }
    }
  ],
  "bolts": [
    {
      "type": "punchlet_node",
      "settings": {
        "punchlet_code": "{print(root:[default][doc].toJson());}"
      },
      "storm_settings": {
        "executors": 1,
        "component": punchlet_node",
        "subscribe": [
          {
            "component": "extractor_spout",
            "stream": "default"
          }
        ]
      }
    }
  ]
}

Run the topology with the command:

punchlinectl <your_topology>.json

If you don't see any data on the terminal, recheck the elasticseach_spout settings to ensure that at least one documents is found.

Then, update the punchlet_code field with the following settings:

"punchlet_code" : "{ logger().warn(root:[default][doc].toJson()); }"

Update the logger configuration¶

We recommend to start the extraction in foreground.

Backup the current log4j2-topology.xml file located in the operator library folder. Depending on the platform installation, it can be found at:

standalone: <install_dir>/external/punch-operator-*/bin/log4j2-topology.xml
deployed: /data/opt/punch-operator-*/bin/log4j2-topology.xml

Now, update the previous log4j2-topology.xml file with the following settings:

fileName: path to store extracted logs
filePattern: name and pattern of archives that contains extracted logs.

Here is an example of original file only update with the new parameters. Here, the ... must be replaced with the original file content, we only removed them to be clearer.

<?xml version="1.0" encoding="UTF-8"?>
<Configuration monitorInterval="10" shutdownHook="disable">

    <properties>
        <property name="patternPunchlet">%msg%n</property>
        ...
    </properties>

    <Appenders>
        <RollingFile name="PUNCHLETLOGGER"
                     fileName="${sys:punchplatform.log.dir}/extraction/${sys:logfile.name}.json"
                     filePattern="${sys:punchplatform.log.dir}/extraction/${sys:logfile.name}.json.%i.gz">
            <PatternLayout>
                <pattern>${patternPunchlet}</pattern>
            </PatternLayout>
            <Policies>
                <SizeBasedTriggeringPolicy size="100 MB"/>
            </Policies>
            <DefaultRolloverStrategy max="1000000"/>
        </RollingFile>
        ...
    </Appenders>

    <Loggers>
        <logger name="org.thales.punch.libraries.punchlang.api.Punchlet" level="warn" additivity="false">
            <appender-ref ref="PUNCHLETLOGGER"/>
        </logger>
        ...
    </Loggers>

</Configuration>

Finally, run the extraction¶

punchlinectl <topology_name>.json

Where are the output file ?¶

For example, if you use the default parameter set in the log4j2-topology.xml above, you have to find the punchplatform.log.dir. To do so, run this command:

punchplatform-env.sh | grep PUNCHPLATFORM_LOG_DIR

From this folder, you will find your extraction files under the extraction directory. In each files, you finally get one log event per line.