The Lumberjack input node is similar to the TCP Syslog input node, but instead of bytes, lumberjack frames are the base unit.
A lumberjack frame is a binary encoded set of key value pairs. It makes it possible to send map structures onto TCP socket, and benefit of a per frame acknowledgement.
Here is a simple example using the read socket timeout.
- type: lumberjack_input component: input settings: listen: host: 0.0.0.0 port: 5052 compression: false publish: - stream: logs fields: - log
Here is a complete configuration example. The [listen] section is identical to the Syslog input one. The lumberjack inpout accepts additional keep options in particular keepalives, so as to periodically send keep alive message to the clients to check for connection aliveness.
Two options are supported to react to dead peer sockets : the
read_timeout_ms property or the
Here is a complete example.
- type: lumberjack_input component: input settings: listen: host: 0.0.0.0 port: 5052 compression: false drop_if_queue_full: false queue_flush_size: 1000 queue_flush_interval_ms: 3000 ssl: true, ssl_provider: JDK ssl_private_key": /opt/keys/punchplatform.key.pkcs8 ssl_certificate": /opt/keys/ca.pem # If no data is received for 30 seconds, then perform the read_timeout_action read_timeout_ms: 30000, # In case a zombie client is detected, close the socket. # An alternative "exit" action makes the whole JVM exits, so as to restart the topology altogether read_timeout_action: close # You can also decide what to do should a tuple failure occur. The default # strategy ("close") consists in closing the peer socket. The peer client will reconnect # and resume sending its traffic. This is overall the best strategy because it is very unlikely # to suffer from just one data item failure. In most case you have lots of failure because of a # problem writing the data to a next destination (kafka, elasticsearch, etc..). # Closing the socket # and causing a socket reconnection is at the end as robust as just sending # the fail message for one particular data item. # It is even as good top configure the action as "exit". In that case, the entire JVM will # exit then be restarted. It gives more time for an overloaded system to recover. fail_action: close # If the node receives a window frame then it will acknowledge only the window. # Else the node will acknowledge all lumberjack frames. # By default "auto" but it can be fixed to "window" or "frame". ack_control: auto
Alternatively you can use a keep alive mechanism as follows :
# send keep alive message every 30 seconds keep_alive_interval": 30 # if the corresponding ack is not received after 20 seconds, close the socket keep_alive_timeout": 20
Keepalives were implemented only starting at version Avishai v3.3.5. Do not use if your punch collectors (LTRs°) are running older versions.
To learn more about encryption possibilities, refer to this TLS configurations dedicated section.
Streams and fields¶
The Lumberjack input node emits in the topology a tuple with the fields received in the Lumberjack frame plus 6 optional fields. The optional fields are used to vehiculate (respectively) the remote (local) socket IP address, and the remote (local) socket port number, a unique id and a timestamp. This is summarized by the next illustration:
You are free to select any of the 6 optional reserved fields among :
- _ppf_timestamp : the number of milliseconds since 1/1/1970, timestamp of entry of the document in PunchPlatform
- _ppf_id : a unique id identifying the document, allocated at entry in PunchPlatform
- _ppf_remote_host : the address of the sender of the document to the PunchPlatform
- _ppf_remote_port : the tcp/udp port of the sender
- _ppf_local_host : the address of the punchplatform receiver of the document
- _ppf_remote_port : the tcp/udp port on which the document was received by PunchPlatform
Normally, these fields are first generated at the punch entry point (in a SyslogInput for example), therefore, if these fields are present in the incoming Lumberjack message, the values will be preserved and emitted in the output field with the same names. This is activated only if the corresponding publish stream is defined.
If these fields are not present in the incoming Lumberjack message, then the lumberjack input node will generate default values for these fields, assuming that this node is the actual punch entry point.
For compatibility with BRAD LTRs, the input node will take care of translating BRAD-style 'standard' tuple fields into CRAIG reserved fields :
- local_timestamp => _ppf_timestamp
- local_uuid ==> _ppf_id
- local_port ==> _ppf_local_port
- local_host ==> _ppf_local_host
- remote_port ==> _ppf_remote_port
- remote_host ==> _ppf_remote_host
Therefore, if a tuple is received with the log unique id in 'local_uuid' field, but '_ppf_id' field is published to storm stream by the LumberjackSpout storm_settings, then the output value will be the one received in local_uuid.
You must also explicitly name the lumberjack fields you wish to map as Storm tuple field. Here is an example to process data from a Logstash forwarder or from a PunchPlatform LTR.
type: lumberjack_input settings: ... component: lumberjack_input publish: - stream: logs fields: - log - _ppf_remote_host - _ppf_remote_port - _ppf_local_host - _ppf_local_port - _ppf_id - _ppf_timestamp
If set to , the node will limit the rate to a specified value. You can use this to limit the incoming or outgoing rate of data from/to external systems. Check the related properties.
Only relevant if load_control is set to . Limit the rate to this number of message per seconds.
The lumberjack node supports two compression modes. If
you use the
compression parameter, compression will be
performed at the socket level using the Netty ZLib compression. If
instead you use the
compression is performed as part of Lumberjack frames.
Netty compression is most efficient, but will work only if the peer is a punch Lumberjack node. If you send your data to a standard Lumberjack server such as a Logstash daemon, use lumberjack compression instead.