Skip to content

GeneratorSpout

The GeneratorSpout simply publishes fake data. It can be used to play unit tests or help you design some topologies.

Here is a complete configuration example.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
{
  "type": "generator_spout",
  "spout_settings": {
    "messages": [
      "my first log",
      "My second message log",
      "And finally a third one"
    ]
  },
  "storm_settings": {...}
}

Complete examples

The Generator Spout can work in two different ways:

  1. Publish on a unique stream and field
  2. Publish on various streams and fields

In this first case, will contains an array of text logs. Each message will be send to the stream .

Here is a complete example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
{
  "type": "generator_spout",
  "spout_settings": {
    "messages": [
      "my first log",
      "My second message log",
      "And finally a third one"
    ]
  },
  "storm_settings": {
    "component": "generator",
    "publish": [
      {
        "stream": "logs",
        "fields": [
          "log"
        ]
      }
    ]
  }
}

In the second case, the message property will be an array of JSON. Note that no publish key has been defined in the storm_settings section. Instead, each message completly defines its stream and fields.

Here is a complete example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
{
  "type": "generator_spout",
  "spout_settings": {
    "messages": [
      { "logs": {  "log": "my first log" }},
      { "logs": {  "foo1": "bar", "foo2": "baar", "foo3": "baaar" }},
      { "other": {  "log": "Here I am on another stream!" }}
    ]
  },
  "storm_settings": {
    "component": "generator"
  }
}

Load generation

If you need a lot of messages, but do not want to copy-paste thousands of lines, you can use "messages_count" settings to indicate a total number of messages to be generated.

The provided "messages" list will be reused again and again until the wanted number of messages are emitted.

By default, a 1s interval is waited between each message. If you want faster emission, use 'interval" setting, provided the (approximative) number of milliseconds to wait between two messages generation. A value of 0 will provide best speed the spout can achieve.

If you need some "variation" between each message generated from the "messages" fixed list, you can include the %{message_num} special tag inside your messages strings. This will be replaced by message number (starting at 1).

This is an example of load-generator topology sending a million log documents with different document ids and contents (here for loading an Elasticsearch):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
{
  "tenant": "validation-kafka",
  "channel": "kafka",
  "name": "single",
  "spouts": [
    {
      "type": "generator_spout",
      "spout_settings": {
        "messages_count" : 1000000,
        "interval" : 0,
        "messages": [
          {
            "logs": {
              "log": "## LOG %{message_num} ##",
              "_ppf_id": "msg-%{message_num}"
            }
          }
        ]
      },
      "storm_settings": {
        "component": "generator"
      }
    }
  ],
  "bolts": [
    {
      "type": "elasticsearch_bolt",
      "bolt_settings": {
        "cluster_id": "es_search",
        "reindex_failed_documents" : true,
        "error_index" : {
          "type" : "daily",
          "prefix" : "mytenant-events-indexation-errors-"
       },
        "per_stream_settings": [
          {
            "stream": "logs",
            "index": {
              "type": "daily",
              "prefix": "mytenant-events-"
            },
            "document_value_fields": ["log"],
            "document_id_field" : "_ppf_id",

            "additional_document_value_fields": [
              {
                "type": "date",
                "document_field": "@timestamp",
                "format": "iso"
              }
            ]
          }
        ]
      },
      "storm_settings": {
        "component": "elasticsearch_bolt",
        "subscribe": [
          {
            "component": "generator",
            "stream": "logs"
          }
        ]
      }
    }
  ],
  "storm_settings" : {
        "topology.worker.childopts" : "-Xmx1G -Xms1G",
        "xtopology.max.spout.pending" : 30000
  }
}

Parameters

  • interval: Number 1000

OPTIONAL: Interval of time in milliseconds between the sending of each message. Its default value is set to 1 second.

  • messages_count: Number

OPTIONAL: If you want to generate a big number of messages, you can provide "messages_count" setting, and the generator will send the messages multiple times until the required messages count is reached. Its default value is equal to number of messages.