Reference Architecture overview¶
The goal of this documentation section is to provide examples of punchplatform deployment and associated channels configuration, matching classical needs of log/documents management, as well as capturing design rationale for such architecture.
All the presented architecture elements intend to convey a general design and all provided associated configuration example should be adapted to accommodate the specific needs of each implementation (SLAs, kind and volume of handled data, hardware platform design...)
Reference architecture topics¶
The following main punchplatform architecture topics will be presented as they cover most usual log managements needs:
-
Highly available log collector site with forwarding and local retention for handling communication incidents.
-
General log-management/indexing architecture, with central parsing/indexing of multiple log types collected through multiple remote collection sites.
-
Indexed Archiving and extraction of archive documents
-
Dual-site replication for fast fail-over on major incident or disaster management
-
hardened deployment with ciphered internal communication inside punchplatform deployment
Design Summary¶
This design is the recommended architecture for production environments, as it provides flexibility and resilience.
It is a major architecture recommendation that the Punch Kibana and operator servers, processing servers are separate from data storage and queuing servers.
Processing servers, in charge of running punchlines, can introduce unpredictable resource utilization. Separating processing from data store and administration servers allows each to have a system that can be sized appropriately in terms of CPU, memory and disk.
Elasticsearch and Clickhouse are memory/io intensive applications and so separating it to its own resources is advantageous to prevent resource contention.
Failure Tolerance¶
The Punch is designed to handle different failure scenarios that have different probabilities. When deploying a Punch end-to-end solution, the failure tolerance that you require should be considered and designed for. The minimal number of instances is 3 in a zookeeper or Elasticsearch resilient cluster, 4 if you plan to use S3 Minio storage.