Skip to content

Http Input

The HTTP spout is similar to the TCP Syslog spout, but instead of bytes, http requests bodies are the base unit. The http spout reads logs for the http stream and inject them in the topology.

Here is a complete configuration example.

{
  "type": "http_spout",
  "settings": {
    "rcv_queue_size": 10000,
    "listen": {
      "host": "0.0.0.0",
      "port": 9999,
      "compression": false
    }
  },
    "component": "spout",
    "publish": [
      {
        "stream": "logs",
        "fields": [
          "log",
          "http_uri",
          "http_user_agent",
          "http_content_type"
        ]
      }
    ]
}
}

Compression

The HTTP supports two compression mode. If you use the compression property, compression will be performed at the socket level using the Netty ZLib compression. If instead you use the http_compression parameter, compression is performed as part of HTTP frame.

Note

Netty compression is most efficient, but will work only if the peer is a PunchPlatform HTTP spout. If you send your data to a standard HTTP server such as a Logstash daemon, use http compression instead.

Streams And fields

The HTTP spout emits in the topology a tuple with one or up to 7 fields. One of the field contains the input line, as read on the socket. You can name that field the way you want. The other fields are optional and used to vehicle (respectively) the remote (local) socket IP address, the remote (local) socket port number, the local timestamp (settled at reception) and a unique id. This is summarized by the next illustration:

image

  • log: String

    the json document, received by the HttpSpout as a body request

  • http_uri: String

    HTTP path uri (e.g. ‘/path/to/resource?q=a&bool=true’)

  • http_user_agent: String

    HTTP user-agent (e.g. ‘curl/7.54.0’)

  • http_content_type: String

    HTTP content-type (e.g. ‘application/x-www-form-urlencoded’)

  • local_uuid: String

    a unique log id

  • local_host: String

    the local host

  • local_port: Integer

    the local port

  • remote_host: String

    the remote host

  • remote_port: Integer

    the remote port

  • local_timestamp: Integer

    the local timestamp (settled by the HttpSpout when it received the log)

Parameters

  • host: String

    the listening host address. Accepted values are dotted number IP addresses, or hostname. Use "0.0.0.0" to listen to all interfaces.

  • port: Integer

    The listening port.

  • punchlet: String

    an optional punchlet file, that will be run on the data. The punchlet file will be looked for in the channel configuration directory

  • decode_as_json: String "smart"

    Incoming strings can be decoded as json document, if they have a json structure. Use "smart", "never" or "always" do tell the spout what to do. With smart, incoming Strings are decoded if they match a json object structure. With always, the input is expected to be a json stringified document. With "never", the incoming String will be put as is in an initially created document.

  • json_header: String "message"

    the name of the Json header used to embed the received data. That is, if you receive "hello world" on the socket, the spout will emit a storm tuple with value "{"message" : "hello world"}".

  • proto: String "tcp"

    The listening protocol, "udp", "tcp", "lumberjack" or "http"

  • load_control: String "none"

    If set to "rate", the spout will limit the rate to a specified value. You can use this to limit the incoming or outgoing rate of data from/to external systems. Check the "load_control.adaptative" and "load_control.rate" related properties

  • load_control.rate: long 10000

    Only relevant if load_control is set to "rate". Limit the rate to this number of message per seconds.

  • load_control.adaptative: long false

    If true, the load control will not limit the traffic to load_control.rate message per second, but to less or more as long as the topology is not overloaded. Overload situation is determined by monitoring the Storm tuple traversal time. If that traversal time increases (which occurs quickly as soon as the topology is overloaded), the load controller will limit the allowed rate. This option makes it easy to protect your topology without limiting its rate to a too conservative rate, as it is difficult if not impossible to estimate the maximum rate.

  • multiline: boolean false

    If true, the spout will aggregate subsequent event line using a prefix such as "t". Refer to the other multiline properties.

  • multiline.regex: String """

    set the regex used to determine if a event line is an initial or subsequent line.

  • multiline.delimiter: String " ""

    Once completed the aggregated event line is made up from the sequence of each line. You can insert a line delimiter to make further parsing easier.

  • multiline.timeout: long 1000

    If a single event line is received, it will be forwarded downstream after this timeout, expressed in milliseconds.

  • info_log_period: Integer 3600

    Regularly, an info event will be generated indicating the success or failure of tuples journey. This parameter lets you change that period, expressed in seconds

  • ssl: String false

    Set to true to listen on SSL. If true, you must provide a certificate and a private key.

  • ssl_certificate: String

    An X.509 certificate file in PEM format

  • ssl_private_key: String

    A PKCS#8 private key file in PEM format. This file must be available on the Storm worker host. If you have another format, refer to the explanation below.

  • ssl_trusted_certificate: String

    An chained file in PEM format for trusted certificates

  • ssl_secret: String

    An optional secret associated to the key if it is password protected.

  • secret: String

    The secret to load a TLS PKCS12 SSL context

  • stream_id: String "default"

    The Storm stream for emitting tuples. This setting must be in line with the spout stream publishing intent. Do not change this setting for default topologies relying on the intent to publish on the "default" stream.

  • field_id: string "default"

    The Storm field for emitting the tuple content. This setting must be in line with the spout field publishing intent. Do not change this setting for default topologies, relying on the intent to publish on the "default" field.

  • direct_response: boolean true

    If true, HttpSpout respond to the client immediately after the Storm ack. If false, the HttpSpout subscribe to a whiteboard when it receives a data and the callback associated to the subscription needs to be called by a WhiteBoardBolt to respond to the client. Be careful: the HttpSpout and the WhiteBoardBolt need to share the same worker.

  • rcv_queue_size: Integer 10000

    To each configured listening address corresponds one running thread, which reads in lines and stores them in a queue, in turn consumed by the syslog input executor(s). This parameter sets the size of that queue. If the queue is full, the socket will not be read anymore, which in turn will slow down the sender. Use this to give more or less capacity to accept burst without impacting the senders.

SSL/TLS

To learn more about encryption possibilities, refer to this SSL/TLS configurations dedicated section.

Metrics

See metrics_http_spout