Punch components¶
Abstract
This chapter lists the high-level components making up punchplatform, and sums up their role in the overall picture. Remember that actual components deployed on your platform depends on the solution you need.
Integrated Open-sources¶
Elasticsearch (OSS Version)¶
is in charge of indexing, processing queries (counting, aggregating, get best hits)
Kibana (OSS Version)¶
web front-end allowing a user to design and run queries, present results in graphical widgets, group those in dashboards Kibana also allows application plugins bringing more features. E.G.: the Punchplatform plugin, the Punch User Feedback UI
Elastic Beats (OSS Version):¶
Lightweight metrics/events collector (usually written in Go). They know how to directly send json documents to Elastic, or to Kafka or Lumberjack servers (punch lumberjack input node, logstash) One special case is Metricbeat (deployed on all punchplatform servers, to gather and centralize system resource metrics) Others: Winlogbeat (windows event log), Auditbeat (kernel event capture), Filebeat (log files capture), Packetbeat(network trafic probe)....
Apache Kafka:¶
Roughly a distributed, resilient queuing system. We store logs/documents in there at various stages of their transport/processing. We also store monitoring events (user actions audit, task logs) during the collection or forwarding process of these data. We also use it as a communication mean for commands to and within Shiva Clusters (it is therefore a requirement for Shiva clusters).
Apache Storm:¶
A scalable, reliable computation farm and framework. It allows to design 'graph' (a 'topology')of processing by chaining nodes ('spouts' and 'bolts') and running them in bunches of JVM (workers inside the 'topology'). Storm takes care of the documents flow in the topology, regardless the number of Java VMs involved, and location.
Storm manages end-to-end acknowlegement of each individual log/document/record (called 'tuple') to ensure at_least_once guarantee.
Punch is able to run Storm-like punchlines either in a Storm cluster (topologies) or as a process out of Storm cluster ('light' Storm punchline engine).
Apache Spark:¶
A widely used scalable computation and data manipulation framework and libraries ecosystem to manipulate data sets or tasks requiring more memory than a single server can provide.
Apache Zookeeper:¶
A scalable, reliable critical data filesystem. It provides features dedicated to support cluster applications needs for leadership management, nodes presence detection, voting. It is a required base for Kafka and Storm.
Elastalert:¶
A rule engine based on Elasticsearch backend for data querying and rule results storage. In punch, it is packaged as a 'standard application' that can run inside a shiva cluster for high availability.
Open Distro Security plugins for Elasticsearch and Kibana:¶
Open Distro is an Apache2 implementation of several Elasticsearch and Kibana plugins that bring added features (also covered by Elastic XPACK paying version of ELK). Punchplatform can integrate the 'security' Elastic and Kibana plugins, to provide securing and access control of Elasticsearch APIs:
- can activate SSL between Elastic cluster nodes, and between Elastic and Kibana
- Applies RBAC restrictions to all Elastic queries, with propagation of Kibana connected user identity/roles, and bindings capability to a customer LDAP, AD or Kerberos.
- Auditing of access and queries
Logstash¶
Part of the ELK stack, logstash is the log shipper/parsing tool (unclustered). It is in itself unclustered, so the clustering has to be handled by remote configuration of multiple nodes, and a front kafaka or source load balancer. In Punch, we usually use punchlines with punch processor node in a Shiva or Storm cluster to replace this logstash stage. In some cases, logstash can bring useful input or output connectors (e.g. RELP). For this purpose, punch provides a packaged version of logstash, usable as a task running inside shiva cluster (which gives HA), for using as a reception or emitting stage only.
Minio¶
Minio is a clustered technology allowing to deliver a S3-compatible objects-storage with actual physical storage distributed raid-like on multiple standard servers or VMs. It can be used to store archived logs, data models or any other object coming out of punch pipelines.
Cepĥ¶
Ceph is a RedHat-promoted open technology allowing to deliver a scalable objects storage AND a shared distributed filesystem, ith actual physical storage distributed raid-like on multiple standard servers or VMs. It can be used to store archived logs, data models or any other object coming out of punch pipelines, or to store shared files or configuration without having a central physical device ( no Single Point of Failure).
Punchplatform-specific components¶
Punch operator command-line environment¶
This linux command-line environment provides :
- libraries of components that can be used to build (through configuration files, not code) Storm-like or Spark stream/batch applications running in Shiva, Spark or Storm.
- operator commands allowing to submit such applications to the Shiva/Spark/Storm cluster for resilient execution
- operator commands to operate the underlying opensource products (Kafka, Elastic, Kibana, Zookeeper)
- developer/integrator commands to test or benchmark punchlet programs, and to inject test flows to running applications
Punchlang library¶
The Punch language is a simple, non-verbose language for building/manipulating structured (json) documents. It contains all usual parsing/filter operators, and is also a full language with access to java libraries and constructs.
This library is included in Storm/Spark punch processor nodes, to allow simple or complex manipulations of document in streaming and batch applications. It is also available for integration to other technical frameworks (e.g. NIFI)
Punchplatform deployer¶
This is an automated deployment tool allowing to deploy using a single configuration all integrated components of Punch on target VMs or servers for production use. The choice of which components to deploy or not is controlled by the configuration. It is therefore possible to deploy only punch-specific components, and to deploy 'off the shelf' open-source (Elasticsearch, Kafka...) by external means, if desired.
Shiva¶
Shiva is a lightweight distributed tasks scheduler. It works as a distributed 'systemd-like' and distributed 'cron-like' service. You can submit to shiva tasks for "permanent execution" (like systemd) or "periodic execution" (like cron), with placement groups if needed (e.g. when you need some hardware resource like CPU-intensive VM or network interface).
Elastalert plugins¶
The Elastalert packaged as a standard shiva application by punch includes several punch connectors plugins, to handle rules dynamic loading from Elasticsearch, and to produce output flows (Kafka or Elastic), with an ECS normalization compatible with Cybels Analytics correlation/detection rules sets.
Punchplatform main Kibana plugin¶
This Kibana plugin provides additional (optional) features in Kibana: - access to local punchplatform documentation - resources viewer/editor - CSV Extraction tool - online grok/punchlang expressions tester - Visual punchline viewer/designer
Punchplatform User Feedback plugin¶
This Kibana plugin provides capacity to build Kibana views with user input capacity, in order to annotate chosen data with some level of User feedback. This is most important in supervised AI use cases, where User must 'tag' fals or good detections to help improve the machine learning and resulting models.
Punch API gateway¶
This is a REST Web server providing: - services API required to use any of the Punch Kibana plugins - Elasticsearch requests (optional) exposure or filtering (including optionally Kibana issued queries). This filtering of requests can be customized for the platform, to help protect the Elasticsearch cluster from overload by too heavy/too wide queries