HOWTO cope with lumberjack inactive sockets¶
Why do that¶
The PunchPlatform LTR to LMR configuration is key to transport data reliably from one system/site to another. To support data shipping across sites, lumberjack connections are used by combining Lumberjack bolts and spout. A Lumberjack bolt transfers logs to a Lumberjack spout using a reliable, acknowledged protocol in a way to make sure all logs are properly processed.
In between you can have lossly networks, firewalls, resulting into socket timeouts and zombies. In order to correctly react to these, you have several configurations options described next.
What to do¶
Configure socket applicative keep alive¶
The Lumberjack bolt accept a keep alive configuration. It makes the bolt periodically send a keep alive lumberjack message to the spout, and kill close the connection should the spout be too slow to respond with a corresponding ack message.
This keep alive message is harmless and invisible to your chain of processing. Here is a typical settings :
{
     "type" : lumberjack_input",
     "settings" : {
         "destination" : [
           {
               "host" : "target.ip.address",
               "port" : 9999,
                ...
               # Use a keep alive applicative message exchange to make sure the server is alive
               # Here we send such keep alive message every 30 seconds
               "keep_alive_interval" : 30,
               # and we give 20 seconds to the server to send us back the corresponding acknowledgement.
               # if not received in that time interval the socket will be closed 
               "keep_alive_timeout" : 20
           }
         ]
     },
     "storm_settings" : { ... }
 }
On the server (spout) side, you have two options. The first is to configure a read socket timeout. If no data is received after a time delay, the corresponding socket will be closed.
{
  "type" : lumberjack_input",
  "settings" : {
      "listen" : [
          {
             "host" : "target.ip.address",
             "port" : "9999",
             "compression" : true,
             "ssl" : true,
             "ssl_private_key" : "/opt/keys/punchplatform.key.pkcs8",
             "ssl_certificate" : "/opt/keys/punchplatform.crt",
             #
             # If no data is received for 30 seconds, then perform the read_timeout_action
             "read_timeout_ms" : 30000,
             # In case a zombie client is detected, close the socket. 
             # An alternative "exit" action makes the whole JVM exits, so as to restart the topology alltogether
             "read_timeout_action" : "close"
          }]
  },
  "storm_settings" : { ... }
}
An alternative solution it to make the spout use the keep alive mechanism.
Warning
WATCHOUT the keep alive option is supported staring avishai-3.3.5. You can only use it after you have upgraded all your LTRs. We recommend using this option only in PunchPlatform brad installed systems.