Census Guide¶
The ‘windowed_census.punch’ punchlet permits to keep a list of various values of a field with a simple refreshing mechanism. To understand the mechanism, a simple example is better :
action : receiving field: 'value1' result : the census register the 'value1' and send it a elasticsearch output action : receiving field: 'value1' result : the census has already the value. It do nothing action : receiving field: 'value2' result : the census register the 'value2' and send it a elasticsearch output action : the end of the period is reached and receiving field: 'value1' result : the census clear the register, the census register the 'value1' and send it a elasticsearch output. The former elasticsearch entry will be overridden by the new because they have the same identifier (_id field)Warning
Stateful punchlet should be use with perfect knowledge of the impacts of this one. The windowed census keep a map of different value of the field and doesn’t have a memory protection !!
Getting started¶
Modify your punchlets to expose the field ‘census_element_id’ in the stream ‘census’
[census][census_element_id] = document:[obs][host][name];
Declare the stream ‘census’ and the field ‘census_element_id’ in the publish section of your punchlet (by example the ‘parsing’ punchlet)
"publish" : [ ..., { "stream" : "census", "fields" : ["census_element_id"] } ],
Add a new Punch bolt and a new elasticsearh bolt which will treat the new stream with ‘windowed_census.punch’
{ "type" : "punch_bolt", "bolt_settings" : { "punchlet" : [ "standard/common/windowed_census.punch"] }, "storm_settings" : { "executors": 1, "component" : "punch_bolt_census", "publish" : [ { "stream" : "census", "fields" : ["census_element", "es_index", "es_type", "census_element_id" ] } ], "subscribe" : [ { "component" : "punch_bolt", "stream" : "census", "grouping": "localOrShuffle" } ] } }, { "type" : "elasticsearch_bolt", "bolt_settings" : { "watchdog_timeout" : "1h", "batch_size" : 1, "queue_size" : 10, "batch_interval" : 10, "data_field" : "census_element", "index_field" : "es_index", "type_field" : "es_type", "id_field" : "census_element_id", "es_timestamp" : "ts", "es_timestamp_format" : "iso", "request_timeout" : "20s", "cluster_id" : "{{channel.output.elasticsearch.cluster}}", "error_timestamp" : "ts", "document_type" : "census_element", "error_type" : "census_element", "index_pattern_prefix" : "census-", "index_pattern_date_suffix" : "yyyy.MM.dd", "index_failed" : true }, "storm_settings" : { "executors": 1, "component" : "elasticsearch_bolt_census", "subscribe" : [ { "component" : "punch_bolt_census", "stream" : "census", "grouping": "localOrShuffle" } ] } }
By default the ‘windowed_census.punch’ set the period to 180 seconds. You can change the properties directly in the punch
.... long period = 180; ....
Add a new mapping in Elasticsearch “mapping_census.json”
{ "order" : 100, "template" : "census*", "settings": { }, "mappings" : { "_default_" : { "_all" : { "enabled" : false }, "_source" : { "enabled" : false }, "_timestamp" : { "enabled" : false }, "date_detection" : false, "numeric_detection": false }, "properties": { "census_element_id" : { "type" : "string", "index" : "not_analyzed" }, "ts" : { "type" : "date" , "format":"epoch_millis||strictDateOptionalTime" , "index" : "not_analyzed"} } } }
Configure and start your channels and inject some logs
In Kibana, create a new index pattern with ‘census-*’ and the ‘ts’ as ‘Time-field name’, and watch the result in ‘Discover’ panel. Enjoy !