Punch Components and concepts¶

Quick tour of Punch Components¶

We already have seen many of the punch components during the previous track. Let's review them all

Concepts¶

The punch provides you with few key concepts. Most of them standard. All of them natural and intuitive. Once clear to you the punch will have no secret anymore ! We'll go through the differents concepts levels, from the most 'local' to the most 'global'.

punchlines --> applications / plans - channels - tenants - platform

Punchlines¶

Let's go through the Punchlines overview.

Key points

What is a 'punchline' ?

It is a Punch pipeline of course.

Is is a DAG (Directed Acyclic Graph) of nodes that process data from inputs to outputs

It is defined by a configuration file.

What are punchline nodes ?

They are components provided by Punch operator environment, or developed for custom solution, chained in the punchline DAG.

There are: - Input Nodes (Listening to network ports for various protocols, Pulling data from servers or storage clusters, reading files...) - Processing Nodes (programmable Punch Node, Spark SQL node, MLLib AI Nodes, Filter nodes...) - Output Nodes (forwarding to some other system, writing in a queue or a Database...)

Where can I run punchlines ?

In Storm, in Spark, in a single unix process, in a container executing a single unix process

Is a punchline used for streaming or batch processing

Both are possible:

Storm-like punchlines (running in ashiva or Storm) are mostly for stream processing
Spark punchlines are mostly for batch processing, but if you use spark-streaming input nodes, then you can have stream processing with batches.

How can I scale the througput of a storm-like punchline

For small scalability you can increase the number of threads/executors of a node in the punchline (more cpus usable by the process) in a load-balanced way
For scalability ohigher than a single process or VM resources, you can have multiple instances of a punchline cooperating (Usually consuming the data from a Kafka queue).

Among the following applications, which are a good match for a punchline ?

a REST micro service
an Iot device ingestion process
a real time detection streaming apps
a e-commerce shoopping cart app on top of a SQL database
a prediction service exposed as a REST endpoint
a prediction batch appliction executed on large datasets
an anomaly detection application executed on real time data

Some answers

4) Probably not 2) 3) 6) 7) Totally, yes 5) Yes (it has already been prototyped) 1) Depends ( Sure if it is an ingestion REST web service ; more complex if you need data retrieval)

Applications¶

Go through the Applications overview.

Here is a sample fictive solution built on punch. It receives some data, save them to an archive (i.e. cold long-term storage) and to elasticsearch (hot online data), exposes the hot data to user through Kibana dashboard, and generates some real time alerts. It is composed of 5 punch applications. Four of them are punchlines, plus the punch elastalert application.

Key points

an application is some part of your "solution" or of its monitoring.
an application is inside Storm or Shiva cluster
application can be a stream-processing process, a periodic process execution (including punch 'plans'), or any custom command (daemon or periodic task)

Plans¶

Let's have a look at one of the inbuilt Punch application types: Spark plans.

Go through the Plans overview.

Key points

A plan is an inbuilt punch type of application.
A plan manages the periodic execution of a batch punchline (most often, a Spark punchline), with updated settings to ensure processing of successive data time slices.
Although the punchline runs in Spark runtime engine, the 'plan' application is submitted as an application to a Shiva cluster.

Channel¶

Go through the Channels overview.

Channels simply lets you organise your applications into convenient groups. For example, our previous sample application could be organised using three channels named (say) input, predict and alert.

Once defined you can manage your channel with command such as:

Key points

a channel is a 'group' of applications, allowing you to manage (submit/stop) this group with a single command if you desire
a channel is also a monitored entity (by the standard "channels monitoring" application)
you decide the 'channels' of your solution depending of what makes sense, for operation/maintenance purpose, metrics grouping purpose, templating purpose...

Tenant¶

Go through the Tenants overview.

Key points

- tenants can be customers or organization with separate document sets needing views only on their own data
- tenants are associated to separate configuration and naming conventions for logical isolation
- some features in punch help provide logical isolation: 
    - separate pipelines configuration, kafka queues, Elastic indices, associated to standard object names prefixing
    - per-tenant command-line operations (operator always has to state the tenant he is working on)
    - multiple Kibana instances with filtered indexes list OR Opendistro RBAC applied to Elasticsearch indices access
    - housekeeping protection reducing cross-tenants erasure

Platform¶

A Platform is the set of servers hosting the punchplatform integrated OSS and punch-specificic components, that will execute and monitor your applications for execution and store its data, using a unique configuration tree.

You get to decide in how many platforms you split your whole system. It is often a good practive to have separate 'platforms' for different sites, because it help operate the sites independently in case of network interruption or site disaster situation.

Therefore, a local collection/LTR site is often one platform, while a central back-office processing and storing data is an other one. Each platform has its identifier, so as to differenciate the produced metrics and monitoring data, even when forwarded and centralized in a unique central monitoring system.

Go through the Platform overview.

Configuration structure¶

Each platform has a configuration, best viewed as a structured filesystem tree.

Go through the Configuration overview for a peek at its structure, that matches the concepts we have seen.

End-to-end Demonstration¶

Together with the trainer sharing its screen, let us see a punch in action.

a (standalone) platform (or a pre-deployed training platform)
a complete tenant configuration tree
starting platform and tenant channels
Injecting logs and viewing parsed logs in Kibana
Viewing dashboard on cyber data
a look at monitoring dashboard for the channels and platform

Security patterns and features¶

Punch addresses security risks through several optional features

Key points

tenant concept at the heart of our design (useful even in a single-customer context, to separate business-level and platform-management level)
SSL capabilities on connectors punchline nodes (See Reference guide).
Kibana-Elastic and Elastic-Elastic SSL (Through Opendistro security plugin
N-Tiering / Filtering capability Punch Gateway
RBAC enforcement at Kibana/ES level (through OpenDistro security) or inter-tenant isolation through separate Kibana instances + modsecurity rulesets