Skip to content

Components

Abstract

This chapter lists the high-level components making up punchplatform, and sums up their role in the overall picture. Remember that actual components deployed on your platform depends on the solution you need. Especially, all COTS deployment are out of Punchplatform deployment tools ; some of them may be provided through Thales KAST.

Integrated Open-sources

Elasticsearch (OSS Version) or Opensearch

is in charge of indexing, processing queries (counting, aggregating, get best hits)

Kibana (OSS Version) or Opensearch dashboards

web front-end allowing a user to design and run queries, present results in graphical widgets, group those in dashboards Kibana also allows application plugins bringing more features. E.G.: the Punchplatform plugin, the Punch User Feedback UI

Elastic Beats (OSS Version):

Lightweight metrics/events collector (usually written in Go). They know how to directly send json documents to Elastic, or to Kafka or Lumberjack servers (punch lumberjack input node, logstash) One special case is Metricbeat (deployed on all punchplatform servers, to gather and centralize system resource metrics) Others: Winlogbeat (windows event log), Auditbeat (kernel event capture), Filebeat (log files capture), Packetbeat(network trafic probe)....

Apache Kafka:

Roughly a distributed, resilient queuing system. We store logs/documents in there at various stages of their transport/processing. We also store monitoring events (user actions audit, task logs) during the collection or forwarding process of these data. We also use it as a communication mean for commands to and within Shiva Clusters (it is therefore a requirement for Shiva clusters).

Apache Storm:

A scalable, reliable computation farm and framework. It allows to design 'graph' (a 'topology')of processing by chaining nodes ('spouts' and 'bolts') and running them in bunches of JVM (workers inside the 'topology'). Storm takes care of the documents flow in the topology, regardless the number of Java VMs involved, and location.

Storm manages end-to-end acknowlegement of each individual log/document/record (called 'tuple') to ensure at_least_once guarantee.

Punch is able to run Storm-like punchlines either in a Storm cluster (topologies) or as a process out of Storm cluster ('light' Storm punchline engine).

Apache Spark:

A widely used scalable computation and data manipulation framework and libraries ecosystem to manipulate data sets or tasks requiring more memory than a single server can provide.

A widely used scalable computation and data manipulation framework and libraries ecosystem to manipulate data sets or tasks requiring more memory than a single server can provide.

Apache Zookeeper:

A scalable, reliable critical data filesystem. It provides features dedicated to support cluster applications needs for leadership management, nodes presence detection, voting. It is a required base for Kafka and Storm.

Elastalert:

A rule engine based on Elasticsearch backend for data querying and rule results storage. In punch, it is packaged as a 'standard application' that can run inside a shiva cluster for high availability.

Open Distro Security plugins for Elasticsearch and Kibana (already included in Opensearch base distribution):

Open Distro is an Apache2 implementation of several Elasticsearch and Kibana plugins that bring added features. The punch integrate the opendistro security Elastic and Kibana plugins that provide:

  • TLS encryption of traffic bewteen the elastic cluster nodes, and between nodes and Kibana
  • RBAC restrictions to elastic queries, with propagation of Kibana connected user identity/roles, and bindings capabilities to a customer LDAP, AD or Kerberos.
  • Auditing of access and queries

Logstash

Part of the ELK stack, logstash is the log shipper/parsing tool. It is in itself not resilient and scalable, so the clustering has to be handled by remote configuration of multiple nodes, and by leveraging front kafka or source load balancer. In punch, the (flink|spark|storm) punchline provides a better option as they can arbitrarily scale. This said, logstash is useful at the edge, in charge of receiving remote traffic with its many supported connectors (such as RELP). In that role, no need for high-availability anyway.

Fluentbit

Fluent Bit is an open source Log Processor and Forwarder, typically used in Kast stack to convey logs from platform and all pods into a log backend (like Elasticsearch or Loki)

Minio

Minio provides a S3-compatible objects-storage with its actual physical storage distributed raid-like on multiple standard servers or VMs. It is used to store all sort of data: archived logs, ML models, images, parquet files etc..

Ceph

Ceph is a RedHat supported scalable objects storage and shared distributed filesystem. Like minio its actual physical storage is distributed raid-like on multiple standard servers or VMs. It can be used to provide a storage backend for container volumes in a K8s deployment, and is available in Kast stack through native Rook integration Ceph can of course be used to store all sort of data: archived logs, ML models, images, parquet files etc..

Elastalert

Elastalert is an elasticsearch-centric simple rules engine. It is used for example by Cybels Analytics layer to deploy cybersecurity oriented detection rules based on Cybels Analytics extension of ECS data model of logs.

Punch components

Punch Console

This linux command-line environment provides :

  • The commands and configuration files used to deploy Punch Services inside a K8s platform (Operator service, Artifacts service, API gateway service...)
  • The mean to download resources from internet, to allow later offline deployment of Punchplatform
  • Additional developer/integrator commands to test or benchmark punchlet programs, and to inject test data to the platform for testing or capacity planning.
  • Examples of stormlines
  • Example dashboards for monitoring Punch apps
  • Various commands to operate the underlying opensource products (Kafka, Elastic, Kibana, Zookeeper)

Punchlang library

The Punch language is a simple, non-verbose language for manipulating structured (json) documents. It contains many parsing/filter library functions and provides access to standard java APIs.

This library is included in punchline processor nodes, to allow simple or complex manipulations of document in streaming and batch applications. It is also available for integration into other technical frameworks (e.g. NIFI)

Punchplatform [operator]((../../Kube/User/Operator.md) and punchline CRDs

This is a K8s extension, deployed through a punchplatform operator HELM chart, that:

  • manages the configuration validation and enrichment of punchlines that the user submits to the K8s platform
  • manages the lifecycle of punchline objects in K8s, especially to manage the underlying K8s execution resources, and the scheduling of Punchplatorm Plans

Punchplatform artifacts server

This is a REST Web server providing both a REST API endpoint and a Graphical User Interface. It provides browsing/storage/retrieval of standardized punchline resources packages - parsers or business-oriented punchlet code files - data enrichment resources (inventory, geographic data, business reference data, security context...) - consistent packages of multiple above files

The back end storage can be a S3/Minio object storage, or simple shared/local filesystem

Punchplatform Extraction server

This is a REST Web server providing both a REST API endpoint and a Graphical User Interface. It allows to request CSV extraction of Elasticsearch queries (typically less than 1 Million documents, which is still more than easily accessible through Kibana or Opensearch dashboards), and download the result through the web interface or GUI

User Feedback plugins

This Kibana and Superset compatible plugins provide capacity to build Kibana views with user input capacity, in order to annotate chosen data with some level of User feedback. This is most important in supervised AI use cases, where User must 'tag' fals or good detections to help improve the machine learning and resulting models.

Punchplatform Kibana Plugin

A specific Punchplatform Kibana plugin provides additional (optional) features in Kibana: - access to local punchplatform documentation - online grok/punchlang expressions tester - Visual punchline viewer/designer

API Gateway server

This is a REST Web server providing: - services API required to use any of the Punch Kibana plugins - Elasticsearch requests (optional) exposure or filtering (including optionally Kibana issued queries). This filtering of requests can be customized for the platform, to help protect the Elasticsearch cluster from overload by too heavy/too wide queries