Generator Input¶
The GeneratorSpout simply publishes fake data. It can be used to play unit tests or help you design some topologies.
Here is a simple complete configuration example to start with.
- type: generator_input
settings:
expectation: none
acked:
messages:
- "my first log"
- "My second message log"
- "And finally a third one"
publish:
- stream: logs
fields:
- log
Complete examples¶
The Generator Spout can work in two different ways:
- Publish on a unique stream and field
- Publish on various streams and fields
In the first case, will contains an array of text logs just like in the previous example. Each message will be send to the stream .
In the second case, the message
property will be an array
of JSON. Note that no publish
key has been defined in the
storm_settings
section. Instead, each message completely
defines its stream and fields.
Here is a complete example:
{
type: generator_input
component: generator
settings: {
messages: [
{
logs: {
log: my first log
}
}
{
logs: {
foo1: bar
foo2: baar
foo3: baaar
}
}
{
other: {
log: Here I am on another stream!
}
}
]
}
}
Load generation¶
If you need a lot of messages, but do not want to copy-paste thousands of lines, you can use "messages_count" settings to indicate a total number of messages to be generated.
The provided "messages" list will be reused again and again until the wanted number of messages are emitted.
By default, a 1s interval is waited between each message. If you want faster emission, use 'interval" setting, provided the (approximated) number of milliseconds to wait between two messages generation. A value of 0 will provide best speed the spout can achieve.
If you need some "variation" between each message generated from the "messages" fixed list, you can include the
%{message_num}
special tag inside your messages strings. This will be replaced by message number (starting at 1).
This is an example of load-generator topology sending a million log documents with different document ids and contents (here for loading an Elasticsearch):
{
tenant: validation-kafka
channel: kafka
name: single
dag: [
{
type: generator_input
component: generator
settings: {
messages_count : 1000000
interval : 0
messages: [
{
logs: {
log: ## LOG %{message_num} ##
_ppf_id: msg-%{message_num}
}
}
]
}
}
{
type: elasticsearch_output
settings: {
cluster_id: es_search
reindex_failed_documents : true
error_index : {
type : daily
prefix : mytenant-events-indexation-errors-
}
per_stream_settings: [
{
stream: logs
index: {
type: daily
prefix: mytenant-events-
}
document_value_fields: [
log
]
document_id_field : _ppf_id
additional_document_value_fields: [
{
type: date
document_field: @timestamp
format: iso
}
]
}
]
}
subscribe: [
{
component: generator
stream: logs
}
]
}
]
storm_settings : {
topology.worker.childopts : "-Xmx1G -Xms1G"
xtopology.max.spout.pending : 30000
}
}
Parameters¶
interval
: Number 1000
OPTIONAL: Interval of time in milliseconds between the sending of each message. Its default value is set to 1 second.
messages_count
: Number
OPTIONAL: If you want to generate a big number of messages, you can provide "messages_count" setting, and the generator will send the messages multiple times until the required messages count is reached. Its default value is equal to number of
messages
.