HOWTO replay logs from kafka from a specific date
Why do that¶
Replay logs from a date could be used in order to rectify a
Parsing
, reinject missing logs in a
external component,... To replay logs from a timestamp in Kafka, it's
necessary to query Kafka to get the corresponding offset with the given
timestamp, read Kafka from the offset and then write logs where you
want. With punchplatform, you have nothing to do except configure your
topology !
Prerequisites¶
- The data in your Kafka cluster must be available.
What to do¶
Configure your replay topology¶
- From a existing topology :
- if you can/want stop the topology, then stop the topology and configure your Kafka spout (see next section)
- else copy/paste the topology and configure your Kafka spout (see next section)
- Add the parameter to your Kafka spout. The start_offset_strategy should be 'last_committed'. Example :
"spouts" : [
{
"type" : "kafka_input",
"settings" : {
"start_offset_strategy" : "last_committed",
"from_datetime" : "2017-04-19T10:00:00+01:00",
"brokers" : "local",
"load_control" : "none",
"load_control.rate" : 500,
"load_control.adaptative" : true,
"watchdog_timeout_ms" : 120000,
"topic" : "mytenant_apache_httpd"
},
"storm_settings" : {
"executors": 1,
"component" : "kafka_input",
"publish" : [ { "stream" : "logs", "fields" : ["log", "local_host", "local_port", "remote_host", "remote_port", "local_uuid", "local_timestamp" ] } ]
}
}
],
Run the topology¶
- To perform the replay scenario, run the following command:
punchlinectl <YOUR_TOPOLOGY>
When PunchPlatform start a topology with parameter,
then by default a new Kafka consumer group will be created with the name
: