Http Input¶
The HTTP spout is similar to the TCP Syslog spout, but instead of bytes, http requests bodies are the base unit. The http spout reads logs for the http stream and inject them in the topology.
Here is a complete configuration example.
{
"type": "http_spout",
"settings": {
"rcv_queue_size": 10000,
"listen": {
"host": "0.0.0.0",
"port": 9999,
"compression": false
}
},
"component": "spout",
"publish": [
{
"stream": "logs",
"fields": [
"log",
"http_uri",
"http_user_agent",
"http_content_type"
]
}
]
}
}
Compression¶
The HTTP supports two compression mode. If you use the compression property, compression will be performed at the socket level using the Netty ZLib compression. If instead you use the http_compression parameter, compression is performed as part of HTTP frame.
Note
Netty compression is most efficient, but will work only if the peer is a PunchPlatform HTTP spout. If you send your data to a standard HTTP server such as a Logstash daemon, use http compression instead.
Streams And fields¶
The HTTP spout emits in the topology a tuple with one or up to 7 fields. One of the field contains the input line, as read on the socket. You can name that field the way you want. The other fields are optional and used to vehicle (respectively) the remote (local) socket IP address, the remote (local) socket port number, the local timestamp (settled at reception) and a unique id. This is summarized by the next illustration:
log
: Stringthe json document, received by the HttpSpout as a body request
http_uri
: StringHTTP path uri (e.g. ‘/path/to/resource?q=a&bool=true’)
http_user_agent
: StringHTTP user-agent (e.g. ‘curl/7.54.0’)
http_content_type
: StringHTTP content-type (e.g. ‘application/x-www-form-urlencoded’)
local_uuid
: Stringa unique log id
local_host
: Stringthe local host
local_port
: Integerthe local port
remote_host
: Stringthe remote host
remote_port
: Integerthe remote port
local_timestamp
: Integerthe local timestamp (settled by the HttpSpout when it received the log)
Parameters¶
host
: Stringthe listening host address. Accepted values are dotted number IP addresses, or hostname. Use "0.0.0.0" to listen to all interfaces.
port
: IntegerThe listening port.
punchlet
: Stringan optional punchlet file, that will be run on the data. The punchlet file will be looked for in the channel configuration directory
decode_as_json
: String "smart"Incoming strings can be decoded as json document, if they have a json structure. Use "smart", "never" or "always" do tell the spout what to do. With smart, incoming Strings are decoded if they match a json object structure. With always, the input is expected to be a json stringified document. With "never", the incoming String will be put as is in an initially created document.
json_header
: String "message"the name of the Json header used to embed the received data. That is, if you receive "hello world" on the socket, the spout will emit a storm tuple with value "{"message" : "hello world"}".
proto
: String "tcp"The listening protocol, "udp", "tcp", "lumberjack" or "http"
load_control
: String "none"If set to "rate", the spout will limit the rate to a specified value. You can use this to limit the incoming or outgoing rate of data from/to external systems. Check the "load_control.adaptative" and "load_control.rate" related properties
load_control.rate
: long 10000Only relevant if load_control is set to "rate". Limit the rate to this number of message per seconds.
load_control.adaptative
: long falseIf true, the load control will not limit the traffic to load_control.rate message per second, but to less or more as long as the topology is not overloaded. Overload situation is determined by monitoring the Storm tuple traversal time. If that traversal time increases (which occurs quickly as soon as the topology is overloaded), the load controller will limit the allowed rate. This option makes it easy to protect your topology without limiting its rate to a too conservative rate, as it is difficult if not impossible to estimate the maximum rate.
multiline
: boolean falseIf true, the spout will aggregate subsequent event line using a prefix such as "t". Refer to the other multiline properties.
multiline.regex
: String """set the regex used to determine if a event line is an initial or subsequent line.
multiline.delimiter
: String " ""Once completed the aggregated event line is made up from the sequence of each line. You can insert a line delimiter to make further parsing easier.
multiline.timeout
: long 1000If a single event line is received, it will be forwarded downstream after this timeout, expressed in milliseconds.
info_log_period
: Integer 3600Regularly, an info event will be generated indicating the success or failure of tuples journey. This parameter lets you change that period, expressed in seconds
ssl
: String falseSet to true to listen on SSL. If true, you must provide a certificate and a private key.
ssl_certificate
: StringAn X.509 certificate file in PEM format
ssl_private_key
: StringA PKCS#8 private key file in PEM format. This file must be available on the Storm worker host. If you have another format, refer to the explanation below.
ssl_trusted_certificate
: StringAn chained file in PEM format for trusted certificates
ssl_secret
: StringAn optional secret associated to the key if it is password protected.
secret
: StringThe secret to load a TLS PKCS12 SSL context
stream_id
: String "default"The Storm stream for emitting tuples. This setting must be in line with the spout stream publishing intent. Do not change this setting for default topologies relying on the intent to publish on the "default" stream.
field_id
: string "default"The Storm field for emitting the tuple content. This setting must be in line with the spout field publishing intent. Do not change this setting for default topologies, relying on the intent to publish on the "default" field.
direct_response
: boolean trueIf true, HttpSpout respond to the client immediately after the Storm ack. If false, the HttpSpout subscribe to a whiteboard when it receives a data and the callback associated to the subscription needs to be called by a WhiteBoardBolt to respond to the client. Be careful: the HttpSpout and the WhiteBoardBolt need to share the same worker.
rcv_queue_size
: Integer 10000To each configured listening address corresponds one running thread, which reads in lines and stores them in a queue, in turn consumed by the syslog input executor(s). This parameter sets the size of that queue. If the queue is full, the socket will not be read anymore, which in turn will slow down the sender. Use this to give more or less capacity to accept burst without impacting the senders.
SSL/TLS¶
To learn more about encryption possibilities, refer to this SSL/TLS configurations dedicated section.
Metrics¶
See metrics_http_spout