Skip to content

Log Collector Site

Abstract

This chapter explain the recommended deployment of a resilient punch log collector.

One Node Setup

  • The collector is deployed on a single physical server.
  • If log buffering is required a single node Kafka broker is required
  • A single node Shiva cluster is in charge of starting the punchlines
  • Punchlines are in charge of
    • receiving the traffic from the collected zone and saving it into Kafka
    • consuming Kafka and forwarding the traffic to the central punch site

Three Nodes Setup (Highly Available setup)

  • The collector must be deployed on three underlying physical servers (not merely 3 logical VMs on less physical servers).
  • The Kafka, Zookeeper and Shiva cluster are deployed on three VMs or containers.
  • Punchlines are in charge of
    • receiving the traffic from the collected zone and saving it into Kafka
    • consuming Kafka and forwarding the traffic to the central punch site

The multi nodes setup allows to:

  • listen for logs arrival on multiple servers at the same time, allowing for high availability of input point through the deployment of a classical system-level 'Virtual IP' management service clustered on these multiple servers. 1 listening punchline is located on each of the input servers of the collector site.
  • have replicated retention data, with copies of each data fragment on 2 of the 3 servers (replication is handled natively by Kafka)
  • provide availability of internal processing (retention, forwarding) through
    • inbuilt Kafka resilience to single-node failure
    • capability of the shiva cluster to restart tasks (e.g. forwarding punchline) on other nodes in case of node failure

The 3 zookeeper nodes cluster ensure data integrity and availability even in case of network partitioning (one node being separated from the others).

That way, the cluster can rely on having a strict majority of the nodes in order to know it possess 'the truth' about the data status.

This makes self-restoration of the collection service and data retention more reliable when a faulty node is repaired and joins the other two surviving nodes cluster. A 2 physical nodes setup is proned to merge conflict, and therefore would imply more manual operation to handle some incidents, and more potential data loss.

Except the "input" punchline, that is running once for each input server, the other processing/forwarding punchline of the collector site can run in only 1 instance, scheduled automatically to a random node by Shiva Cluster, and respawned elsewhere in case of node failure.