Skip to content

Generator Input

The GeneratorSpout simply publishes fake data. It can be used to play unit tests or help you design some topologies.

Here is a simple complete configuration example to start with.

- type: generator_input
  settings:
    expectation: none
    messages:
      - "my first log"
      - "My second message log"
      - "And finally a third one"
  publish:
  - stream: logs
    fields:
    - log

Complete examples

The Generator Spout can work in two different ways:

  1. Publish on a unique stream and field
  2. Publish on various streams and fields

In the first case, will contains an array of text logs just like in the previous example. Each message will be send to the stream .

In the second case, the message property will be an array of JSON. Note that no publish key has been defined in the storm_settings section. Instead, each message completely defines its stream and fields.

Here is a complete example:

{
  type: generator_input
  component: generator
  settings: {
    messages: [
      { 
        logs: {  
          log: my first log 
        }
      }
      { 
        logs: {
          foo1: bar
          foo2: baar
          foo3: baaar 
        }
      }
      {
        other: {
          log: Here I am on another stream! 
        }
      }
    ]
  }
}

Load generation

If you need a lot of messages, but do not want to copy-paste thousands of lines, you can use "messages_count" settings to indicate a total number of messages to be generated.

The provided "messages" list will be reused again and again until the wanted number of messages are emitted.

By default, a 1s interval is waited between each message. If you want faster emission, use 'interval" setting, provided the (approximated) number of milliseconds to wait between two messages generation. A value of 0 will provide best speed the spout can achieve.

If you need some "variation" between each message generated from the "messages" fixed list, you can include the %{message_num} special tag inside your messages strings. This will be replaced by message number (starting at 1).

This is an example of load-generator topology sending a million log documents with different document ids and contents (here for loading an Elasticsearch):

{
  tenant: validation-kafka
  channel: kafka
  name: single
  dag: [
    {
      type: generator_input
      component: generator
      settings: {
        messages_count : 1000000
        interval : 0
        messages: [
          {
            logs: {
              log: ## LOG %{message_num} ##
              _ppf_id: msg-%{message_num}
            }
          }
        ]
      }
    }
    {
      type: elasticsearch_output
      settings: {
        cluster_id: es_search
        reindex_failed_documents : true
        error_index : {
          type : daily
          prefix : mytenant-events-indexation-errors-
        }
        per_stream_settings: [
          {
            stream: logs
            index: {
              type: daily
              prefix: mytenant-events-
            }
            document_value_fields: [
              log
            ]
            document_id_field : _ppf_id
            additional_document_value_fields: [
              {
                type: date
                document_field: @timestamp
                format: iso
              }
            ]
          }
        ]
      }
      subscribe: [
        {
          component: generator
          stream: logs
        }
      ]
    }
  ]
  storm_settings : {
    topology.worker.childopts : "-Xmx1G -Xms1G"
    xtopology.max.spout.pending : 30000
  }
}

Parameters

  • interval: Number - default: 1000

OPTIONAL: Interval of time in milliseconds between the sending of each message. Its default value is set to 1 second.

  • messages_count: Number - default: nb of messages

OPTIONAL: If you want to generate a big number of messages, you can provide "messages_count" setting, and the generator will send the messages multiple times until the required messages count is reached. Its default value is equal to number of messages.

  • expectation: String - default: NONE

OPTIONAL: These are the (case ignored) allowed values for the 'expectation' setting of the generator spout. ALL_ACKED_NO_FAIL : The JVM will be exited with rc=1 if a tuple fails ; it will be exited with rc=0 if all emitted tuples are acked ALL_ACKED: Any failed tuple will be reemitted by the generator spout (with no change). The JVM will be exited with rc=0 if all tuples are finally acked NONE: The JVM will not exit on its own. A message will report once a result (ack or fail) has been received for all emitted tuples (no reemission in case of failure).