Lumberjack spout

The Lumberjack spout is similar to the TCP Syslog spout, but instead of bytes, lumberjack frames are the base unit. The lumberjack read logs for the lumberjack stream and inject them in the topology. A lumberjack frame is a binary encoded set of key value pairs. It makes it possible to end map structures onto TCP socket, and benefit of per frame acknowledgement.

Here is a complete configuration example. The listen section is identical to the Syslog spout one. The lumberjack spout accepts additional keep options in particular keepalives, so as to periodically send keep alive message to the clients to check for connection aliveness.

Note

two options are supported to react to dead peer sockets : the read_timeout_ms property or the keep_alive properties.

Here is an example using the read socket timeout.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
   {
     "type" : "lumberjack_spout",
     "spout_settings" : {
         "listen" : [
             {
                "host" : "target.ip.address",
                "port" : "9999",
                "compression" : true,
                "ssl" : true,
                "ssl_private_key" : "/opt/keys/punchplatform.key.pkcs8",
                "ssl_certificate" : "/opt/keys/punchplatform.crt",

                #
                # If no data is received for 30 seconds, then perform the read_timeout_action
                "read_timeout_ms" : 30000,

                # In case a zombie client is detected, close the socket.
                # An alternative "exit" action makes the whole JVM exits, so as to restart the topology altogether
                "read_timeout_action" : "close",

                # If the spout receives a window frame then it will acknowledge only the window.
                # Else the spout will acknowledge all lumberjack frames.
                # By default "auto" but it can be fixed to "window" or "frame".
                "ack_control" : "auto"

             }]
     },
     "storm_settings" : { ... }
   }

Alternatively you can use a keep alive mechanism as follows :

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
   {
     "type" : "lumberjack_spout",
     "spout_settings" : {
         "listen" : [
             {
                "host" : "target.ip.address",
                ...

                # send keep alive message every 30 seconds
                "keep_alive_interval" : 30,

                # if the corresponding ack is not received after 20 seconds, close the socket
                "keep_alive_timeout" : 20

             }]
     },
     "storm_settings" : { ... }
   }

Warning

the support for keepalive send from the spout to the bolt has been implemented only starting at version Avishai v3.3.5. Do not use if your LTRs are running an older version.

SSL

For testing, you can generate a certificate and keys using :

$ openssl req -x509 -batch -nodes -newkey rsa:2048 -keyout punchplatform.key -out punchplatform.crt

The Lumberjack spout expects private key in PKCS8. Use the following to convert a non PKCS8 to PKCS8 key.

$ openssl pkcs8 -topk8 -nocrypt -in punchplatform.key -out punchplatform.key.pcks8

Streams and fields

The Lumberjack spout emits in the topology a tuple with the fields received in Lumberjack plus 6 optional fields. The optional fields and used to vehiculate (respectively) the remote (local) socket IP address, and the remote (local) socket port number, a unique id and a timestamp. This is summarized by the next illustration:

../../../../_images/LumberjackSpoutContract.png

You are free to select any of the 6 optional fields, by default they MUST be named “remote_host”, “remote_port”, “local_host”, “local_port”, “local_uuid” and “local_timestamp”. You must also name explicitly the lumberjack field you wish to map as Storm tuple. Here is an example to process data from a Logstash forwarder

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
       spout_settings" : { ... },
       "storm_settings" : {
         "executors": 1,
         "component" : "a_syslog_spout",
         "publish" : [
           {
             "stream" : "logs",
             "fields" : [
               "log",
               "remote_host",
               "remote_port",
               "local_host",
               "local_port",
               "local_uuid",
               "local_timestamp"
             ]
           }
         ]
       }
   }

You can rename the fields in the configuration of the spout_settings but don’t forget to change also the name in the publish section for Storm :

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
       spout_settings" : {
         "local_host_field" : "new_local_host",
         "remote_host_field" : "new_remote_host",
         "remote_port_field" : "new_remote_port",
         "local_port_field" : "new_local_port",
         "local_uuid_field" : "new_local_uuid",
         "local_timestamp_field" : "new_local_timestamp"
       },
       "storm_settings" : {
         "executors": 1,
         "component" : "a_syslog_spout",
         "publish" : [
           {
             "stream" : "logs",
             "fields" : [
               "log",
               "new_remote_host",
               "new_remote_port",
               "new_local_host",
               "new_local_port",
               "new_local_uuid",
               "new_local_timestamp"
             ]
           }
         ]
       }
   }

Compression

The PunchPlatform Lumberjack spout supports two compression modes. If you use the compression parameter, compression will be performed at the socket level using the Netty ZLib compression. If instead you use the lumberjack_compression parameter, compression is performed as part of Lumberjack frames.

Note

Netty compression is most efficient, but will work only if the peer is a Punchplatform Lumberjack spout. If you send your data to a standard Lumberjack server such as a Logstash daemon, use lumberjack compression instead.