HTTP Spout¶
The HTTP spout is similar to the TCP Syslog spout, but instead of bytes, http requests bodies are the base unit. The http spout reads logs for the http stream and inject them in the topology.
Here is a complete configuration example.
1 2 3 4 5 6 7 8 9 10 11 12 13 | {
"type": "http_spout",
"spout_settings": {
"listen": {
"host": "0.0.0.0",
"port": "9999",
"compression": true
}
},
"storm_settings": {
...
}
}
|
Compression¶
The HTTP supports two compression mode. If you use the compression property, compression will be performed at the socket level using the Netty ZLib compression. If instead you use the http_compression parameter, compression is performed as part of HTTP frame.
Note
Netty compression is most efficient, but will work only if the peer is a PunchPlatform HTTP spout. If you send your data to a standard HTTP server such as a Logstash daemon, use http compression instead.
Streams And fields¶
The HTTP spout emits in the topology a tuple with one or up to 7 fields. One of the field contains the input line, as read on the socket. You can name that field the way you want. The other fields are optional and used to vehiculate (respectively) the remote (local) socket IP address, the remote (local) socket port number, the local timestamp (settled at reception) and a unique id. This is summarized by the next illustration:

Field | Type | Description |
---|---|---|
log | String | the json document, received by the HttpSpout as a body request |
http_uri | String | HTTP path uri (e.g. ‘/path/to/resource?q=a&bool=true’) |
http_user_agent | String | HTTP user-agent (e.g. ‘curl/7.54.0’) |
http_content_type | String | HTTP content-type (e.g. ‘application/x-www-form-urlencoded’) |
local_uuid | String | a unique log id |
local_host | String | the local host |
local_port | int | the local port |
remote_host | String | the remote host |
remote_port | int | the remote port |
local_timestamp | int | the local timestamp (settled by the HttpSpout when it received the log) |