Skip to content

Generator Input

The GeneratorSpout simply publishes fake data. It can be used to play unit tests or help you design some topologies.

Here is a complete configuration example.

{
  type: generator_input
  settings: {
    messages: [
      my first log
      My second message log
      And finally a third one
    ]
  }
}

Complete examples

The Generator Spout can work in two different ways:

  1. Publish on a unique stream and field
  2. Publish on various streams and fields

In this first case, will contains an array of text logs. Each message will be send to the stream .

Here is a complete example:

{
  type: generator_input
  component: generator
  settings: {
    messages: [
      my first log
      My second message log
      And finally a third one
    ]
  }
  publish: [
    {
      stream: logs
      fields: [
        log
      ]
    }
  ]
}

In the second case, the message property will be an array of JSON. Note that no publish key has been defined in the storm_settings section. Instead, each message completely defines its stream and fields.

Here is a complete example:

{
  type: generator_input
  component: generator
  settings: {
    messages: [
      { 
        logs: {  
          log: my first log 
        }
      }
      { 
        logs: {
          foo1: bar
          foo2: baar
          foo3: baaar 
        }
      }
      {
        other: {
          log: Here I am on another stream! 
        }
      }
    ]
  }
}

Load generation

If you need a lot of messages, but do not want to copy-paste thousands of lines, you can use "messages_count" settings to indicate a total number of messages to be generated.

The provided "messages" list will be reused again and again until the wanted number of messages are emitted.

By default, a 1s interval is waited between each message. If you want faster emission, use 'interval" setting, provided the (approximated) number of milliseconds to wait between two messages generation. A value of 0 will provide best speed the spout can achieve.

If you need some "variation" between each message generated from the "messages" fixed list, you can include the %{message_num} special tag inside your messages strings. This will be replaced by message number (starting at 1).

This is an example of load-generator topology sending a million log documents with different document ids and contents (here for loading an Elasticsearch):

{
  tenant: validation-kafka
  channel: kafka
  name: single
  dag: [
    {
      type: generator_input
      component: generator
      settings: {
        messages_count : 1000000
        interval : 0
        messages: [
          {
            logs: {
              log: ## LOG %{message_num} ##
              _ppf_id: msg-%{message_num}
            }
          }
        ]
      }
    }
    {
      type: elasticsearch_output
      settings: {
        cluster_id: es_search
        reindex_failed_documents : true
        error_index : {
          type : daily
          prefix : mytenant-events-indexation-errors-
        }
        per_stream_settings: [
          {
            stream: logs
            index: {
              type: daily
              prefix: mytenant-events-
            }
            document_value_fields: [
              log
            ]
            document_id_field : _ppf_id
            additional_document_value_fields: [
              {
                type: date
                document_field: @timestamp
                format: iso
              }
            ]
          }
        ]
      }
      subscribe: [
        {
          component: generator
          stream: logs
        }
      ]
    }
  ]
  storm_settings : {
    topology.worker.childopts : "-Xmx1G -Xms1G"
    xtopology.max.spout.pending : 30000
  }
}

Parameters

  • interval: Number 1000

OPTIONAL: Interval of time in milliseconds between the sending of each message. Its default value is set to 1 second.

  • messages_count: Number

OPTIONAL: If you want to generate a big number of messages, you can provide "messages_count" setting, and the generator will send the messages multiple times until the required messages count is reached. Its default value is equal to number of messages.