Skip to content

SyslogSpout

Using the syslog spout you have a number of ways to read data from external application. The syslog spout reads lines from an UDP or TCP socket. When TCP is used, the new line (0a in hex) character is used to delimit lines.

The socket level configuration items are defined in a listen subsection of the spout configuration. Here is an example of a TCP server listening on all network interfaces on port 9999:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
{
  "type": "syslog_spout",
  "spout_settings": {
    "load_control": "none",
    "listen": {
      "proto": "tcp",
      "host": "0.0.0.0",
      "port": 9999
    }
  },
  "storm_settings": {
    ...
  }
}

Info

Refer to the syslog javadoc documentation.

Streams And fields

The Syslog spout receives log lines. That line is emitted in the topology as a Tuple field. Here is how you name the corresponding stream and field.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
{
  "type": "syslog_spout",
  "spout_settings": {
    ...
  },
  "storm_settings": {
    "component": "syslog_spout_tcp",
    "publish": [
      # the incoming log lines will be emitted as single value Tuple, 
      # having a "log" field, and emitted on the stream "logs" 
      {
        "stream": "logs",
        "fields": [
          "log"
        ]
      }
    ]
  }
}

The syslog spout can be configured to emit additional information so as to keep track of the sender/receiver addresses, as well as unique identifiers and timestamp. You add these fields by specifying the following fields :

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
{
  "type": "syslog_spout",
  "spout_settings": {
    ...
  },
  "storm_settings": {
    "component": "syslog_spout_tcp",
    "publish": [
      # the incoming log lines will be emitted as single value Tuple, 
      # having a "log" field, and emitted on the stream "logs" 
      {
        "stream": "logs",
        "fields": [
          "log",
          # the peer source ip address
          "remote_host",
          # the peer source ip port
          "remote_port",
          # the local listening ip address
          "local_host",
          # the local listening port
          "local_port",
          # a unique identifier
          "local_uuid",
          # the receiving time
          "local_timestamp"
        ]
      }
    ]
  }
}

You can add none, some or all of these additional fields. Reserved streams and fields are documented in the javadoc:

Latency Tracker Tuple

Besides emitting the received data, you can configure the syslog spout to generate and emit periodically a so-called punchplatform latency tracker tuple. That tuple will traverse your channel and keep track of the traversal time at each traversed spout and bolt. Here is how you configure the spout to emit such record every 20 seconds.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
{
  "type": "syslog_spout",
  "spout_settings": {
    ...
    "self_monitoring.activation": true,
    "self_monitoring.period": 30
  },
  "storm_settings": {
    "component": "syslog_spout_tcp",
    "publish": [
      {
        "stream": "logs",
        "fields": [
          "log"
        ]
      },
      # the latency tracker record must be emitted on these reserved stream/field.
      {
        "stream": "_ppf_metrics",
        "fields": [
          "_ppf_latency"
        ]
      }
    ]
  }
}

Refer to the monitoring guide use the resulting measured latency to monitor your channel.

Multiline

The Syslog spouts supports multi-line reading, it will then aggregate subsequent log lines. A subsequent line is detected using either a regex or a starts-with prefix. Aggregating multi lines require to possibly wait for the last ones, hence you need to set a timeout to emit what has been received should the sender not send a subsequent line for long time, while keeping its TCP socket open.

Last you can also set a delimiter string should you require a special tag to separate your lines in downstream processing.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
{
  "type": "syslog_spout",
  "spout_settings": {
    "multiline": true,
    "multiline.startswith": "\t",
    "multiline.delimiter": " ",
    "multiline.timeout": 1000
    ...
  },
  "storm_settings": {
    ...
  }
}

Warning

the aggregated line contains the matching regex or starting prefix. These are not replaced. Should you need to do that you can use a downstream punchlet.

Optional Parameters

  • load_control

    String : "none"

    the spout will limit the rate to a specified value. You can use this to limit the incoming or outgoing rate of data from/to external systems. Check the related properties

  • load_control.rate

    Long : 10000

    Only relevant if load_control is set to limit the rate to this number of message per seconds.

  • load_control.adaptative

    Boolean : false

    If true, the load control will not limit the traffic to load_control.rate message per second, but to less or more as long as the topology is not overloaded. Overload situation is determined by monitoring the Storm tuple traversal time. If that traversal TODO

  • queue_size

    Integer : 100000

    To each configured listening address corresponds one running thread, which reads in lines and stores them in a queue, in turn consumed by the spout executor(s). This parameter sets the size of that queue. If the queue is full, the socket will not be read anymore, which in turn will slow down the sender. Use this to give more or less capacity to accept burst without impacting the senders.

  • listen.max_line_length

    Integer : 1048576

    This is the maximum frame size in TCP mode : if incoming data exceeds this amount without an end of line delimiter being received (\n), then an exception may occur, that will be notified in the topology worker log file. e.g : 2019-07-01 17:22:34.707 c.t.s.c.p.c.n.i.NettyServerDataHandler [INFO] message="socket level exception" cause=frame length (over 1052924) exceeds the allowed maximum (1048576) 2019-07-01 17:22:34.708 STDIO [ERROR] io.netty.handler.codec.TooLongFrameException: frame length (over 1052924) exceeds the allowed maximum (1048576) 2019-07-01 17:22:34.709 STDIO [ERROR] at io.netty.handler.codec.LineBasedFrameDecoder.fail(LineBasedFrameDecoder.java:146)

    This setting is to be provided in the 'listen' section of the spout settings.

  • multiline

    Boolean : false

    If true, the spout will aggregate subsequent log line using a prefix such as . Refer to the other multiline properties.

  • multiline.regex

    String

    set the regex used to determine if a log line is an initial or subsequent line.

  • multiline.delimiter

    String

    Once completed the aggregated log line is made up from the sequence of each line. You can insert a line delimiter to make further parsing easier.

  • multiline.timeout

    Long : 1000

    If a single log line is received, it will be forwarded downstream after this timeout, expressed in milliseconds.

Metrics

See metrics_syslog_spout

SSL/TLS

To learn more about encryption possibilities, refer to this SSL/TLS configurations dedicated section.

Socket Configuration

The listen section can take the following parameters:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
"listen" : {
   # the listening protocol. Can be "tcp" or "udp"
   "proto" : "tcp",

   # the listening host address. Use "0.0.0.0" to listen on all interfaces
   "host" : "0.0.0.0",

   # the listening port number
   "port" : 9999,

   # Enable compression. If true, clients must be configured with compression enabled
   # as well
   "compression" : false,

   # Enable SSL.
   # Not mandatory, false by default
   "ssl" : true,

   # Select your encryption provider, "OPENSSL" or "JDK"
   "ssl_provider": "JDK"

   # key and Certificates.
   # Mandatory if ssl is enabled.
   "ssl_private_key" : "/opt/keys/punchplatform.key.pkcs8",
   "ssl_certificate" : "/opt/keys/punchplatform.crt",

   # Close inbound connections if the clients sends no data.
   # Use 0 to never timeout.
   # Not mandatory: 0 by default
   "read_timeout_ms" : 30000,

   # Only for udp
   # Not mandatory: false by default
   # if true then NioDatagramChannel with ipv4 else default constructor (ipv6)
   "ipv4" : true,

   # Only for udp
   # Not mandatory: 2048 by default
   # if define it sets the max size of a udp packet
   "max_line_length" : 2048,

   # Only for udp
   # Not mandatory: 2048000 by default. 
   # Size (in bytes) of the socket level UDP reception buffers .
   # Should more data arrives before the spout reads it, you will 
   # experience data loss. This is, of course, the way UDP works. 
   # This value cannot exceed the maximum value set at OS level 
   # (sysctl  net.core.rmem_max)
   "recv_buffer_size" : 350000,

   # Not mandatory : 1000 by default.
   # the spout uses an internal per spout queue to provide its 
   # data to the topology. This makes it possible to accept burst
   # of data, without hitting the maximum pending tuples configured 
   # for the spout, as well as reading the socket as fast as 
   # possible, without depending on the Storm upcall. Of course 
   # the bigger the better but the bigger RAM you will need. 
   # When using UDP, this value should be able to receive a burst 
   # of logs (i.e. should be consistent with the EPS expected on 
   # this syslog server)
   "queue_size" : 2000,
 }

Info

  • udp : each line is received as a UDP packet. Hence the line length is limited to 64K. Multiline is not supported on UDP.
  • tcp : TCP sockets can be configured with SSL/TLS support, and/or socket level compression

The following picture summarises the configurable protocol stack you can plug in by configuration. The important reported metrics are also highlighted.

image