Syslog Spout

Using the syslog spout you have a number of ways to read data from external application. The syslog spout reads lines from an UDP or TCP socket. When TCP is used, the carriage return character is used to delimit lines.

The socket level configuration items are defined in a listen subsection of the spout configuration. Here is an example of a TCP server listening on all network interfaces on port 9999:

1
2
3
4
5
6
7
8
   {
     "type" : "syslog_spout",
     "spout_settings" : {
       "load_control" : "none",
       "listen" : { "proto" : "tcp", "host" : "0.0.0.0", "port" : 9999 }
     },
     "storm_settings" : { ... }
   }

The listen section can take the following parameters:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
"listen" : {
   # the listening protocol. Can be "tcp" or "udp"
   "proto" : "tcp",

   # the listening host address. Use "0.0.0.0" to listen on all interfaces
   "host" : "0.0.0.0",

   # the listening port number
   "port" : 9999,

   # Enable compression. If true, clients must be configured with compression enabled
   # as well
   "compression" : false,

   # Enable SSL.
   # Not mandatory, false by default
   "ssl" : true,

   # key and Certificates.
   # Mandatory if ssl is enabled.
   "ssl_private_key" : "/opt/keys/punchplatform.key.pkcs8",
   "ssl_certificate" : "/opt/keys/punchplatform.crt",

   # Close inbound connections if the clients sends no data.
   # Use 0 to never timeout.
   # Not mandatory: 0 by default
   "read_timeout_ms" : 30000,

   # Only for udp
   # Not mandatory: false by default
   # if true then NioDatagramChannel with ipv4 else default constructor (ipv6)
   "ipv4" : true,

   # Only for udp
   # Not mandatory: 2048 by default
   # if define it sets the max size of a udp packet
   "max_line_length" : 2048,

   # Only for udp
   # Not mandatory: 2048000 by default.
   # Size (in bytes) of the socket level UDP reception buffers .
   # Should more data arrives before the spout reads it, you will experience data loss. This is, of course, the way UDP works.
   # This value cannot exceed the maximum value set at OS level (sysctl  net.core.rmem_max)
   "recv_buffer_size" : 350000,

   # Not mandatory : 1000 by default.
   # the spout uses an internal per spout queue to provide its data to the topology. This makes it possible to accept burst
   # of data, without hitting the maximum pending tuples configured for the spout, as well as reading the socket as fast as possible,
   # without depending on the Storm upcall. Of course the bigger the better but the bigger RAM you will need.
   # When using UDP, this value should be able to receive a burst of logs (i.e. should be consistent with the EPS expected on this syslog server)
   "queue_size" : 2000,
 }

SSL

For testing, you can generate a certificate and keys using :

$ openssl req -x509 -batch -nodes -newkey rsa:2048 -keyout punchplatform.key -out punchplatform.crt

The Syslog spout expects private key in PKCS8. Use the following to convert a non PKCS8 to PKCS8 key.

$ openssl pkcs8 -topk8 -nocrypt -in punchplatform.key -out punchplatform.key.pcks8

Multiline

The Syslog spouts supports multi-line reading, it will then aggregate subsequent log lines. A subsequent line is detected using either a regex or a starts-with prefix. Aggregating multi lines require to possibly wait for the last ones, hence you need to set a timeout to emit what has been received should the sender not send a subsequent line for long time, while keeping its TCP socket open.

Last you can also set a delimiter string should you require a special tag to separate your lines in downstream processing.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
   {
     "type" : "syslog_spout",
     "spout_settings" : {
       ...
       "multiline" : true,
       "multiline.startswith" : "\t",
       "multiline.delimiter" : " ",
       "multiline.timeout" : 1000
     },
     "storm_settings" : { ... }
   }

Warning

the aggregated line contains the matching regex or starting prefix. These are not replaced. Should you need to do that you can use a downstream punchlet.

Streams and fields

The Syslog spout emits in the topology a tuple with one or up to 6 fields. One of the field contains the input line, as read on the socket. You can name that field the way you want. The other fields are optional. see lumberjack settings

Protocols, SSL and Compression

The available protocols are the following:

  • “udp” : each line is received as a UDP packet. Hence the line length is limited to 64K. Multiline is not supported on UDP.
  • “tcp” : TCP sockets can be configured with SSL/TLS support, and/or socket level compression

To configure a SSL spout endpoint you need a certificate and a key. The Syslog spout expects private key in PKCS8 format. For testing, you can generate a certificate and keys as follows:

> openssl req -x509 -batch -nodes -newkey rsa:2048 -keyout punchplatform.key -out punchplatform.crt
> openssl pkcs8 -topk8 -nocrypt -in punchplatform.key -out punchplatform.key.pcks8

You then simply add the certificate and the key to your listening address:

 "spouts" : [
    {
        "type" : "syslog_spout",
        "spout_settings" : {
          "listen" : [
            {
              "proto" : "tcp",
              "host" : "0.0.0.0",
              "port" : 9999,
              "ssl" : true,
              "ssl_private_key" : "/opt/keys/punchplatform.key.pkcs8",
              "ssl_certificate" : "/opt/keys/punchplatform.crt"
            }
          ]
    }
]

The following picture depicts the configurable protocol stack you can plug in by configuration. The important reported metrics are also highlighted.

../../../../_images/SyslogStack.png