Skip to content

HOWTO replay logs from kafka from a specific date

Why do that

Replay logs from a date could be used in order to rectify a Parsing, reinject missing logs in a external component,... To replay logs from a timestamp in Kafka, it's necessary to query Kafka to get the corresponding offset with the given timestamp, read Kafka from the offset and then write logs where you want. With punchplatform, you have nothing to do except configure your topology !

Prerequisites

  • The data in your Kafka cluster must be available.

What to do

Configure your replay topology

  • From a existing topology :
    • if you can/want stop the topology, then stop the topology and configure your Kafka spout (see next section)
    • else copy/paste the topology and configure your Kafka spout (see next section)
  • Add the parameter to your Kafka spout. The start_offset_strategy should be 'last_committed'. Example :
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
"spouts" : [
       {
         "type" : "kafka_spout",
         "spout_settings" : {
           "start_offset_strategy" : "last_committed",
           "from_datetime" : "2017-04-19T10:00:00+01:00",
           "brokers" : "local",
           "load_control" : "none",
           "load_control.rate" : 500,
           "load_control.adaptative" : true,
           "watchdog_timeout_ms" : 120000,
           "topic" : "mytenant_apache_httpd"
         },
         "storm_settings" : {
           "executors": 1,
           "component" : "kafka_spout",
           "publish" : [ { "stream" : "logs", "fields" : ["log", "local_host", "local_port", "remote_host", "remote_port", "local_uuid", "local_timestamp" ] } ] 
         }
       }
   ],

Run the topology

  • To perform the replay scenario, run the following command:
1
$ punchplatform-topology.sh <YOUR_TOPOLOGY>

When PunchPlatform start a topology with parameter, then by default a new Kafka consumer group will be created with the name : ...yyyyMMddHHmmss (current date). For long replay scenario, we advice to defined a group name with the parameter. If your topology fails then the consumers will restart from the last commited offset, no need to restart from the beginning !