Generator Input¶

The GeneratorSpout simply publishes fake data. It can be used to play unit tests or help you design some topologies.

Here is a simple complete configuration example to start with.

- type: generator_input
  settings:
    expectation: none
    acked:
    messages:
      - "my first log"
      - "My second message log"
      - "And finally a third one"
  publish:
  - stream: logs
    fields:
    - log

Complete examples¶

The Generator Spout can work in two different ways:

Publish on a unique stream and field
Publish on various streams and fields

In the first case, will contains an array of text logs just like in the previous example. Each message will be send to the stream .

In the second case, the message property will be an array of JSON. Note that no publish key has been defined in the storm_settings section. Instead, each message completely defines its stream and fields.

Here is a complete example:

{
  type: generator_input
  component: generator
  settings: {
    messages: [
      { 
        logs: {  
          log: my first log 
        }
      }
      { 
        logs: {
          foo1: bar
          foo2: baar
          foo3: baaar 
        }
      }
      {
        other: {
          log: Here I am on another stream! 
        }
      }
    ]
  }
}

Load generation¶

If you need a lot of messages, but do not want to copy-paste thousands of lines, you can use "messages_count" settings to indicate a total number of messages to be generated.

The provided "messages" list will be reused again and again until the wanted number of messages are emitted.

By default, a 1s interval is waited between each message. If you want faster emission, use 'interval" setting, provided the (approximated) number of milliseconds to wait between two messages generation. A value of 0 will provide best speed the spout can achieve.

If you need some "variation" between each message generated from the "messages" fixed list, you can include the %{message_num} special tag inside your messages strings. This will be replaced by message number (starting at 1).

This is an example of load-generator topology sending a million log documents with different document ids and contents (here for loading an Elasticsearch):

{
  tenant: validation-kafka
  channel: kafka
  name: single
  dag: [
    {
      type: generator_input
      component: generator
      settings: {
        messages_count : 1000000
        interval : 0
        messages: [
          {
            logs: {
              log: ## LOG %{message_num} ##
              _ppf_id: msg-%{message_num}
            }
          }
        ]
      }
    }
    {
      type: elasticsearch_output
      settings: {
        cluster_id: es_search
        reindex_failed_documents : true
        error_index : {
          type : daily
          prefix : mytenant-events-indexation-errors-
        }
        per_stream_settings: [
          {
            stream: logs
            index: {
              type: daily
              prefix: mytenant-events-
            }
            document_value_fields: [
              log
            ]
            document_id_field : _ppf_id
            additional_document_value_fields: [
              {
                type: date
                document_field: @timestamp
                format: iso
              }
            ]
          }
        ]
      }
      subscribe: [
        {
          component: generator
          stream: logs
        }
      ]
    }
  ]
  storm_settings : {
    topology.worker.childopts : "-Xmx1G -Xms1G"
    xtopology.max.spout.pending : 30000
  }
}

Parameters¶

interval: Number 1000

OPTIONAL: Interval of time in milliseconds between the sending of each message. Its default value is set to 1 second.

messages_count: Number

OPTIONAL: If you want to generate a big number of messages, you can provide "messages_count" setting, and the generator will send the messages multiple times until the required messages count is reached. Its default value is equal to number of messages.