Quick tour of Punch Components¶
We already have seen many of the punch components during the Track 1. Let's review them all
The punch provides you with few key concepts. Most of them standard. All of them natural and intuitive. Once clear to you the punch will have no secret anymore ! We'll go through the differents concepts levels, using a bottom up approach: punchlines then applications then plans then channels then tenants up to a platform.
Let's go through the Punchlines overview.
What is a 'punchline' ?
It is a Punch pipeline of course.
Is is a DAG (Directed Acyclic Graph) of nodes that process data from inputs to outputs
It is defined by a configuration file.
What are punchline nodes ?
They are components provided by Punch operator environment, or developed for custom solution, chained in the punchline DAG.
There are: - Input Nodes (Listening to network ports for various protocols, Pulling data from servers or storage clusters, reading files...) - Processing Nodes (programmable Punch Node, Spark SQL node, MLLib AI Nodes, Filter nodes...) - Output Nodes (forwarding to some other system, writing in a queue or a Database...)
Where can I run punchlines ?
In Storm, in Spark, in a single unix process, in a container executing a single unix process
Is a punchline used for streaming or batch processing
Both are possible:
Storm-like punchlines (running in ashiva or Storm) are mostly for stream processing
Spark punchlines are mostly for batch processing, but if you use spark-streaming input nodes, then you can have stream processing with batches.
How can I scale the througput of a storm-like punchline
For small scalability you can increase the number of threads/executors of a node in the punchline (more cpus usable by the process) in a load-balanced way
For scalability ohigher than a single process or VM resources, you can have multiple instances of a punchline cooperating (Usually consuming the data from a Kafka queue).
Among the following applications, which are a good match for a punchline ?
- a REST micro service
- an Iot device ingestion process
- a real time detection streaming apps
- a e-commerce shoopping cart app on top of a SQL database
- a prediction service exposed as a REST endpoint
- a prediction batch appliction executed on large datasets
- an anomaly detection application executed on real time data
4) Probably not 2) 3) 6) 7) Totally, yes 5) Yes (it has already been prototyped) 1) Depends ( Sure if it is an ingestion REST web service ; more complex if you need data retrieval)
Go through the Applications overview.
Here is a sample fictive solution built on punch. It receives some data, save them to an archive (i.e. cold long-term storage) and to elasticsearch (hot online data), exposes the hot data to user through Kibana dashboard, and generates some real time alerts. It is composed of 5 punch applications. Four of them are punchlines, plus the punch elastalert application.
- an application is some part of your "solution" or of its monitoring.
- an application is inside Storm or Shiva cluster
- application can be a stream-processing process, a periodic process execution (including punch 'plans'), or any custom command (daemon or periodic task)
Let's have a look at one of the inbuilt Punch application types: Spark plans.
Go through the Plans overview.
- A plan is an inbuilt punch type of application.
- A plan manages the periodic execution of a batch punchline (most often, a Spark punchline), with updated settings to ensure processing of successive data time slices.
- Although the punchline runs in Spark runtime engine, the 'plan' application is submitted as an application to a Shiva cluster.
Go through the Channels overview.
Channels simply lets you organise your applications into convenient groups. For example, our previous sample application could be organised using three channels named (say) input, predict and alert.
Once defined you can manage your channel with command such as:
channelctl --start yourtenant/input
- a channel is a 'group' of applications, allowing you to manage (submit/stop) this group with a single command if you desire
- a channel is also a monitored entity (by the standard "channels monitoring" application)
- you decide the 'channels' of your solution depending of what makes sense, for operation/maintenance purpose, metrics grouping purpose, templating purpose...
Go through the Tenants overview.
1 2 3 4 5 6 7
A Platform is the set of servers hosting the punchplatform integrated OSS and punch-specificic components, that will execute and monitor your applications for execution and store its data, using a unique configuration tree.
You get to decide in how many platforms you split your whole system. It is often a good practive to have separate 'platforms' for different sites, because it help operate the sites independently in case of network interruption or site disaster situation.
Therefore, a local collection/LTR site is often one platform, while a central back-office processing and storing data is an other one. Each platform has its identifier, so as to differenciate the produced metrics and monitoring data, even when forwarded and centralized in a unique central monitoring system.
Go through the Platform overview.
Each platform has a configuration, best viewed as a structured filesystem tree.
Go through the Configuration overview for a peek at its structure, that matches the concepts we have seen.
Together with the trainer sharing its screen, let us see a punch in action.
- a (standalone) platform (or a pre-deployed training platform)
- a complete tenant configuration tree
- starting platform and tenant channels
- Injecting logs and viewing parsed logs in Kibana
- Viewing dashboard on cyber data
- a look at monitoring dashboard for the channels and platform
Security patterns and features¶
Punch addresses security risks through several optional features
- tenant concept at the heart of our design (useful even in a single-customer context, to separate business-level and platform-management level)
- SSL capabilities on connectors punchline nodes (See Reference guide).
- Kibana-Elastic and Elastic-Elastic SSL (Through Opendistro security plugin
- N-Tiering / Filtering capability Punch Gateway
- RBAC enforcement at Kibana/ES level (through OpenDistro security) or inter-tenant isolation through separate Kibana instances + modsecurity rulesets