Skip to content

Lumberjack Input

The Lumberjack spout is similar to the TCP Syslog spout, but instead of bytes, lumberjack frames are the base unit.

A lumberjack frame is a binary encoded set of key value pairs. It makes it possible to send map structures onto TCP socket, and benefit of a per frame acknowledgement.

Here is a complete configuration example. The [listen] section is identical to the Syslog spout one. The lumberjack spout accepts additional keep options in particular keepalives, so as to periodically send keep alive message to the clients to check for connection aliveness.

Two options are supported to react to dead peer sockets : the read_timeout_ms property or the keep_alive properties.

Here is an example using the read socket timeout.

  "type": lumberjack_input",
  "settings": {
    "listen": {
      "host": "",
      "port": 9999,
      "compression": true,
      "ssl": true,
      "ssl_provider": "JDK",
      "ssl_private_key": "/opt/keys/punchplatform.key.pkcs8",
      "ssl_certificate": "/opt/keys/punchplatform.crt",
      "ssl_certificate": "/opt/keys/ca.pem",

      # If no data is received for 30 seconds, then perform the read_timeout_action
      "read_timeout_ms": 30000,

      # In case a zombie client is detected, close the socket. 
      # An alternative "exit" action makes the whole JVM exits, so as to restart the topology altogether
      "read_timeout_action": "close",

      # You can also decide what to do should a tuple failure occur. The default
      # strategy ("close") consists in closing the peer socket. The peer client will reconnect
      # and resume sending its traffic. This is overall the best strategy because it is very unlikely
      # to suffer from just one data item failure. In most case you have lots of failure because of a
      # problem writing the data to a next destination (kafka, elasticsearch, etc..). Closing the socket # and causing a socket reconnection is at the end as robust as just sending the fail message for 
      # one particular data item. 
      # It is even as good top configure the action as "exit". In that case, the entire JVM will 
      # exit then be restarted. It gives more time for an overloaded system to recover.
      "fail_action": "close",

      # If the spout receives a window frame then it will acknowledge only the window.
      # Else the spout will acknowledge all lumberjack frames.
      # By default "auto" but it can be fixed to "window" or "frame".
      "ack_control": "auto"
  "storm_settings": {

Alternatively you can use a keep alive mechanism as follows :

  "type": lumberjack_input",
  "settings": {
    "listen": [
        "host": "",
        # send keep alive message every 30 seconds
        "keep_alive_interval": 30,
        # if the corresponding ack is not received after 20 seconds, close the socket
        "keep_alive_timeout": 20,
  "storm_settings": {


the support for keepalive sent from the spout to the bolt has been implemented only starting at version Avishai v3.3.5. Do not use if your LTRs are running an older version.


To learn more about encryption possibilities, refer to this SSL/TLS configurations dedicated section.

Streams and fields

The Lumberjack spout emits in the topology a tuple with the fields received in the Lumberjack frame plus 6 optional fields. The optional fields are used to vehiculate (respectively) the remote (local) socket IP address, and the remote (local) socket port number, a unique id and a timestamp. This is summarized by the next illustration:


You are free to select any of the 6 optional reserved fields among :

  • _ppf_timestamp : the number of milliseconds since 1/1/1970, timestamp of entry of the document in PunchPlatform
  • _ppf_id : a unique id identifying the document, allocated at entry in PunchPlatform
  • _ppf_remote_host : the address of the sender of the document to the PunchPlatform
  • _ppf_remote_port : the tcp/udp port of the sender
  • _ppf_local_host : the address of the punchplatform receiver of the document
  • _ppf_remote_port : the tcp/udp port on which the document was received by PunchPlatform

Normally, these fields are first generated at entry point in the PunchPlatform (in a SyslogInput for example), therefore, if these fields are present in the incoming Lumberjack message, the values will be preserved and emitted in the output field with same name (if published by storm_settings configuration of the spout).

If these fields are not present in the incoming Lumberjack message, then the LumberjackSpout will generate values for these fields (if published), assuming this spout IS the entry point in the Punchplatform.


For compatibility with BRAD LTRs, the LumberjackSpout will take care of translating BRAD-style 'standard' tuple fields into CRAIG reserved fields :

  • local_timestamp => _ppf_timestamp
  • local_uuid ==> _ppf_id
  • local_port ==> _ppf_local_port
  • local_host ==> _ppf_local_host
  • remote_port ==> _ppf_remote_port
  • remote_host ==> _ppf_remote_host

Therefore, if a tuple is received with the log unique id in 'local_uuid' field, but '_ppf_id' field is published to storm stream by the LumberjackSpout storm_settings, then the output value will be the one received in local_uuid.

. You must also name explicitly the lumberjack field you wish to map as Storm tuple field. Here is an example to process data from a Logstash forwarder or from a PunchPlatform LTR

  "type": lumberjack_input",
  "settings": {
  "storm_settings": {
    "executors": 1,
    "component": "a_lumberjack_spout",
    "publish": [
        "stream": "logs",
        "fields": [

Optional Parameters

  • load_control : String

    If set to , the spout will limit the rate to a specified value. You can use this to limit the incoming or outgoing rate of data from/to external systems. Check the related properties.

  • load_control.rate : String

    Only relevant if load_control is set to . Limit the rate to this number of message per seconds.


The PunchPlatform Lumberjack spout supports two compression modes. If you use the compression parameter, compression will be performed at the socket level using the Netty ZLib compression. If instead you use the lumberjack_compression parameter, compression is performed as part of Lumberjack frames.


Netty compression is most efficient, but will work only if the peer is a Punchplatform Lumberjack spout. If you send your data to a standard Lumberjack server such as a Logstash daemon, use lumberjack compression instead.


See metrics_lumberjack_spout

Refer to the lumberjack spout javadoc documentation.