Census Guide

The ‘windowed_census.punch’ punchlet permits to keep a list of various values of a field with a simple refreshing mechanism. To understand the mechanism, a simple example is better :

action : receiving field: 'value1'
result : the census register the 'value1' and send it a elasticsearch output

action : receiving field: 'value1'
result : the census has already the value. It do nothing

action : receiving field: 'value2'
result : the census register the 'value2' and send it a elasticsearch output

action : the end of the period is reached and receiving field: 'value1'
result : the census clear the register, the census register the 'value1' and send it a elasticsearch output. The former elasticsearch entry will be overridden by the new because they have the same identifier (_id field)

Warning

Stateful punchlet should be use with perfect knowledge of the impacts of this one. The windowed census keep a map of different value of the field and doesn’t have a memory protection !!

Getting started

  1. Modify your punchlets to expose the field ‘census_element_id’ in the stream ‘census’

    [census][census_element_id] = document:[obs][host][name];
    
  2. Declare the stream ‘census’ and the field ‘census_element_id’ in the publish section of your punchlet (by example the ‘parsing’ punchlet)

    "publish" : [
      ...,
      {
        "stream" : "census",
        "fields" : ["census_element_id"]
      }
    ],
    
  3. Add a new Punch bolt and a new elasticsearh bolt which will treat the new stream with ‘windowed_census.punch’

     {
      "type" : "punch_bolt",
      "bolt_settings" : {
        "punchlet" : [ "standard/common/windowed_census.punch"]
      },
      "storm_settings" : {
        "executors": 1,
        "component" : "punch_bolt_census",
        "publish" : [
          {
            "stream" : "census",
            "fields" : ["census_element", "es_index", "es_type", "census_element_id" ]
          }
        ],
        "subscribe" : [
          {
            "component" : "punch_bolt",
            "stream" : "census",
            "grouping": "localOrShuffle"
          }
        ]
      }
    },
    {
           "type" : "elasticsearch_bolt",
           "bolt_settings" : {
               "watchdog_timeout" : "1h",
               "batch_size" : 1,
               "queue_size" : 10,
               "batch_interval" : 10,
               "data_field" : "census_element",
               "index_field" : "es_index",
               "type_field" : "es_type",
               "id_field" : "census_element_id",
               "es_timestamp"  : "ts",
               "es_timestamp_format"  : "iso",
               "request_timeout" : "20s",
               "cluster_id" : "{{channel.output.elasticsearch.cluster}}",
               "error_timestamp" : "ts",
               "document_type" : "census_element",
               "error_type" : "census_element",
               "index_pattern_prefix" : "census-",
               "index_pattern_date_suffix" : "yyyy.MM.dd",
               "index_failed" : true
           },
           "storm_settings" : {
              "executors": 1,
              "component" : "elasticsearch_bolt_census",
              "subscribe" : [
                {
                  "component" : "punch_bolt_census",
                  "stream" : "census",
                  "grouping": "localOrShuffle"
                }
              ]
          }
    }
    
  4. By default the ‘windowed_census.punch’ set the period to 180 seconds. You can change the properties directly in the punch

    ....
    long period = 180;
    ....
    
  5. Add a new mapping in Elasticsearch “mapping_census.json”

    {
        "order" : 100,
        "template" : "census*",
        "settings": { },
        "mappings" : {
            "_default_" : {
                "_all" : { "enabled" : false },
                "_source" : { "enabled" : false },
                "_timestamp" : { "enabled" : false },
                "date_detection" : false,
                "numeric_detection": false
            },
            "properties": {
                   "census_element_id" : {
                        "type" : "string", "index" : "not_analyzed"
               },
               "ts" : { "type" : "date" ,  "format":"epoch_millis||strictDateOptionalTime" , "index" : "not_analyzed"}
                 }
        }
    }
    
  6. Configure and start your channels and inject some logs

  7. In Kibana, create a new index pattern with ‘census-*’ and the ‘ts’ as ‘Time-field name’, and watch the result in ‘Discover’ panel. Enjoy !