Skip to content

LumberjackSpout

The Lumberjack spout is similar to the TCP Syslog spout, but instead of bytes, lumberjack frames are the base unit.

A lumberjack frame is a binary encoded set of key value pairs. It makes it possible to send map structures onto TCP socket, and benefit of a per frame acknowledgement.

Here is a complete configuration example. The [listen] section is identical to the Syslog spout one. The lumberjack spout accepts additional keep options in particular keepalives, so as to periodically send keep alive message to the clients to check for connection aliveness.

Two options are supported to react to dead peer sockets : the read_timeout_ms property or the keep_alive properties.

Here is an example using the read socket timeout.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
{
  "type": "lumberjack_spout",
  "spout_settings": {
    "listen": {
      "host": "0.0.0.0",
      "port": 9999,
      "compression": true,
      "ssl": true,
      "ssl_provider": "JDK",
      "ssl_private_key": "/opt/keys/punchplatform.key.pkcs8",
      "ssl_certificate": "/opt/keys/punchplatform.crt",

      # If no data is received for 30 seconds, then perform the read_timeout_action
      "read_timeout_ms": 30000,

      # In case a zombie client is detected, close the socket. 
      # An alternative "exit" action makes the whole JVM exits, so as to restart the topology altogether
      "read_timeout_action": "close",

      # If the spout receives a window frame then it will acknowledge only the window.
      # Else the spout will acknowledge all lumberjack frames.
      # By default "auto" but it can be fixed to "window" or "frame".
      "ack_control": "auto"
    }
  },
  "storm_settings": {
    ...
  }
}

Alternatively you can use a keep alive mechanism as follows :

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
{
  "type": "lumberjack_spout",
  "spout_settings": {
    "listen": [
      {
        "host": "0.0.0.0",
        # send keep alive message every 30 seconds
        "keep_alive_interval": 30,
        # if the corresponding ack is not received after 20 seconds, close the socket
        "keep_alive_timeout": 20,
        ...
      }
    ]
  },
  "storm_settings": {
    ...
  }
}

Warning

the support for keepalive sent from the spout to the bolt has been implemented only starting at version Avishai v3.3.5. Do not use if your LTRs are running an older version.

SSL/TLS

To learn more about encryption possibilities, refer to this SSL/TLS configurations dedicated section.

Streams and fields

The Lumberjack spout emits in the topology a tuple with the fields received in the Lumberjack frame plus 6 optional fields. The optional fields are used to vehiculate (respectively) the remote (local) socket IP address, and the remote (local) socket port number, a unique id and a timestamp. This is summarized by the next illustration:

image

You are free to select any of the 6 optional reserved fields among :

  • _ppf_timestamp : the number of milliseconds since 1/1/1970, timestamp of entry of the document in PunchPlatform
  • _ppf_id : a unique id identifying the document, allocated at entry in PunchPlatform
  • _ppf_remote_host : the address of the sender of the document to the PunchPlatform
  • _ppf_remote_port : the tcp/udp port of the sender
  • _ppf_local_host : the address of the punchplatform receiver of the document
  • _ppf_remote_port : the tcp/udp port on which the document was received by PunchPlatform

Normally, these fields are first generated at entry point in the PunchPlatform (in a SyslogSpout for example), therefore, if these fields are present in the incoming Lumberjack message, the values will be preserved and emitted in the output field with same name (if published by storm_settings configuration of the spout).

If these fields are not present in the incoming Lumberjack message, then the LumberjackSpout will generate values for these fields (if published), assuming this spout IS the entry point in the Punchplatform.

Note

For compatibility with BRAD LTRs, the LumberjackSpout will take care of translating BRAD-style 'standard' tuple fields into CRAIG reserved fields :

  • local_timestamp => _ppf_timestamp
  • local_uuid ==> _ppf_id
  • local_port ==> _ppf_local_port
  • local_host ==> _ppf_local_host
  • remote_port ==> _ppf_remote_port
  • remote_host ==> _ppf_remote_host

Therefore, if a tuple is received with the log unique id in 'local_uuid' field, but '_ppf_id' field is published to storm stream by the LumberjackSpout storm_settings, then the output value will be the one received in local_uuid.

. You must also name explicitly the lumberjack field you wish to map as Storm tuple field. Here is an example to process data from a Logstash forwarder or from a PunchPlatform LTR

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
{
  "type": "lumberjack_spout",
  "spout_settings": {
    ...
  },
  "storm_settings": {
    "executors": 1,
    "component": "a_lumberjack_spout",
    "publish": [
      {
        "stream": "logs",
        "fields": [
          "log",
          "_ppf_remote_host",
          "_ppf_remote_port",
          "_ppf_local_host",
          "_ppf_local_port",
          "_ppf_id",
          "_ppf_timestamp"
        ]
      }
    ]
  }
}

Optional Parameters

  • load_control : String

    If set to , the spout will limit the rate to a specified value. You can use this to limit the incoming or outgoing rate of data from/to external systems. Check the related properties.

  • load_control.rate : String

    Only relevant if load_control is set to . Limit the rate to this number of message per seconds.

Compression

The PunchPlatform Lumberjack spout supports two compression modes. If you use the compression parameter, compression will be performed at the socket level using the Netty ZLib compression. If instead you use the lumberjack_compression parameter, compression is performed as part of Lumberjack frames.

Tip

Netty compression is most efficient, but will work only if the peer is a Punchplatform Lumberjack spout. If you send your data to a standard Lumberjack server such as a Logstash daemon, use lumberjack compression instead.

Metrics

See metrics_lumberjack_spout

Refer to the lumberjack spout javadoc documentation.