HOWTO replay logs from files to elasticsearch
Why do that¶
This document describes how to import JSON files easily to elasticsearch.
The method has a light approach: Job approach (one shot).
What to do¶
Uncompress data¶
First uncompressed data with the following command:
1 | $ gzip -d *.gz |
Create a topology with the following example¶
The settings to update are:
- path: path to read files
- index: elasicsearch index name
Example: FilesToES_topology.json
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 | { "spouts" : [ { "type" : "file_spout", "spout_settings" : { "read_file_from_start" : true, "path" : "/tmp/extraction_files/", "load_control" : "rate", "load_control.rate" : 500, "load_control.adaptative" : true }, "storm_settings" : { "executors": 1, "component" : "file_spout", "publish" : [ { "stream" : "logs" , "fields" : [ "log" ] } ] } } ], "bolts" : [ { "type": "elasticsearch_bolt", "bolt_settings": { "cluster_id": "es_search", "per_stream_settings" : [ { "stream" : "logs", "index" : { "type" : "constant" , "value" : "mytenant-events" } } ] }, "storm_settings": { "component": "ES_bolt", "subscribe": [ { "component": "file_spout", "stream": "logs" } ] } } ], "exception_catcher_bolt" : { "punchlet" : "standard/common/exception_handler.punch", "executors" : 1 }, "storm_settings" : { "metrics_consumers": [ "org.apache.storm.metric.LoggingMetricsConsumer" ], "topology.builtin.metrics.bucket.size.secs": 30, "supervisor.monitor.frequency.secs" : 60, "topology.max.spout.pending" : 50000, "topology.enable.message.timeouts": true, "topology.message.timeout.secs" : 30, "topology.worker.childopts" : "-server -Xms2048m -Xmx2048m", "topology.receiver.buffer.size": 32, "topology.executor.receive.buffer.size": 16384, "topology.executor.send.buffer.size": 16384, "topology.transfer.buffer.size": 32, "topology.worker.shared.thread.pool.size": 4, "topology.disruptor.wait.strategy": "com.lmax.disruptor.BlockingWaitStrategy", "topology.spout.wait.strategy": "org.apache.storm.spout.SleepSpoutWaitStrategy", "topology.sleep.spout.wait.strategy.time.ms": 50, "topology.workers" : 1 } } |
Start the extraction¶
Simply, run the following command:
1 | $ punchplatform-topology.sh -m local --start-foreground --topology <topology_name>.json
|