Skip to content

Quick Tour

Abstract

The goal of this chapter is simple: make it clear in 5 mn reading what the punch is.

Concepts

image

  • a punchline is a data processing pipeline.
    • Stream, batch, storm, spark, flink, go pipelines are all supported through a single declarative format. The range of appplications you can implement with punchline is extremally broad from tiny embedded to large scale distributed data processing pipelines.
    • A punchline must be included in a channel in order to be executed.
  • a plan is a periodic data processing pipeline
    • A plan basically is a punchline that is executed periodically, each time configured to consume specific ranges of data.
    • A plan is the simplest and most common example of data processing workflow. It is so common and useful that the punch provides a dedicated concept to deal with it.
    • Plans are resilient. If something bad happens, plans will resume and will keep processing the data from where they were interrupted.
  • a book is an arbitrary workflow.
    • Books leverage the power of kubernetes argo workflow engine.
    • Books can chain arbitrary punchlines, react to incoming events etc..
  • a function is not a punch concept. It is here to emphasize that you can code your own function (java, python, soon go) and deploy it as part of a punchline.
  • a channel is the punch execution unit.
    • Channels support 4 operations: start, stop, status and reload.
    • Channels can include one or several applications: punchlines, plans, books or your own containerised application.
    • grouping several applications in a channel is a mere facility to manage large scale applications with many (tenth or hundreds) of applications.
  • a tenant is a logical grouping of channels.
    • Using the punch you always work as part of a tenant.
    • Just like channels, tenants support the start, stop, statusand reload operations.

With these you can design a variety of data processing applications in a matter of hours. Among these one family of applications are central to the Punch: log management. The punch provides an additional concept: log parsers. Besides providing you with many default parsers, and benefit from a marketplace, and a complete development toolkit to code your own.

Examples

Ingesting Data

The point of using the punch is to design applications, in particular data centric applications. Here is an application that process data from (say) radar-related equipments. Central to that apps are a few punchline that take care of the various processing stages: ingesting, filtering, detecting etc..

image

To define the simplest channel you need to define two configuration files: the first models the application itself. In this example, the application process Kafka data with a third-party (python) node that compute (say) some predictions.

Here is the punchline configuration file:

version: "7.0"
name: predicter
runtime: pyspark
settings:
  resources:
  - punch-pex:com.mycompany:my-python-prediction-library:3.0.0
dag:
- type: kafka_input
  settings:
    topic: input
  publish:
  - stream: data
    fields:
    - temperature
    - radar-id
- type: detection
  settings:
    parameter: 
  subscribe:
  - component: kafka_input
    stream: data
  publish:
  - stream: data
    fields:
    - temperature
    - radar-id
    - prediction
- type: elasticsearch_output
  settings:
    per_stream_settings:
    - stream: data
      index:
        type: daily
        prefix: mytenant-events-
  subscribe:
  - component: detection
    stream: logs

Let us now define the channel_structure.yaml file:

version: "7.0"
name: predict
applications:
    -name: predicter
     runtime: kubernetes
     cluster: west
     command: punchlinectl
     args: 
     - "start"
     - "--punchline"
     - "predicter.yml" 
     - "--runtime"
     - "pyspark"

Hopefully that file is simple to understand. It tells the punch to submit that punchline to a given kubernetes cluster. And what are the args required to start it. To start you channel simply type in:

channelctl --start predict

Check your kubernetes cluster, there will be pod(s) running. Maybe one or many more, depending on your punchline (spark flink storm etc..). That is plumbery and should not concern you.

Note that this looks simple and great, but it is not the simplest or convenient way to test, debug or design your punchline. In development mode you will prefer to start your punchline as a straight and simple foreground application, frol a terminal or direcly from within your code editor. You can do that like this:

punchlinectl prediction.yaml

It cannot be simpler. Simple to understand, simple to work with, simple to tune.

Log Management

A log management platform is just another use case. Here it is illustrated. You typically have archiving capabilities.

image

Note that implementing a log management platform on top of open source technologies pose serious difficulties. First you need to assemble many components (elastic, kafka etc ..). Second you must have log parsers, preferably ready-to-use, or at least have a development kit to design your own. Last you need to have (many) additional configurations and services: long term archiving, log collection at th edge, site to site log transfer etc..

The punch provides all that. If you have access to the internal thales gitlab inner source area, checkout our parser space.

For you to quickly grasp the way this is achieved here is a log punchline example that leverages the standard punch log parsers.

version: "7.0"
name: sourcefire-parser
runtime: punch
settings:
  resources:
  - punch-parser:org.thales.punch:punch-core-parsers:1.0.0
dag:
- type: syslog_input
  settings:
    listen:
      proto: tcp
      host: 0.0.0.0
      port: 9902
  publish:
  - stream: logs
    fields:
    - log
- type: punchlet_node
  settings:
    punchlet:
    - common/syslog_header_parser.punch
    - sourcefire/parser.punch
  subscribe:
  - component: syslog_input
    stream: logs
  publish:
  - stream: logs
    fields:
    - log
- type: elasticsearch_output
  settings:
    per_stream_settings:
    - stream: logs
      index:
        type: daily
        prefix: mytenant-events-
      document_json_field: log
  subscribe:
  - component: punchlet_node
    stream: logs

Tip

Note here the import of a punch parser package punch-parser:org.thales.punch:punch-core-parsers:1.0.0. You can provide your own of course. That package provides number of parsing functions (i.e. punchlets). Here for example common/syslog_header_parser.punch. You can chain these in many ways.

Developing on the Punch

One of the key characteristics of the punch is to require little coding. Equipped with a few punchlets (snippets of code written using the punch language), SQL statements, punch users can design impressive and complete applications in matter of hours. These applicationss are, in turn, easy to maintain and operate.

The punch also let you implement your own functions. Using a function-as-a-service approach you can provide your own business modules and make them part of punchlines by combining yours with the many provided by the punch.

The punch also lets you implement your own functions. Using a function-as-a-service approach you can provide your own business modules and make them part of punchlines.

Here is the punch design illustrated:

image

Summary

It takes something like an hour to understand the few punch concepts: tenants, channels, punchlines, plans, books, punchlets and parsers. We tried our best not to invent unecessary concepts, there are already too many: pods, workflows, jobs, applications, schedulers, pipelines, containers, images, ci/cd, .. and too many technologies: kubernetes, argo, kafka, clickhouse, elastic, S3, mlflow, kubeflow, airflow, jupyterhub, dockerhub, helm registries, .. .

Punch concepts come from a simple motivation : ensure we can efficiently help our customers. When the punch forward-engineer rescues a customer on its platform, dealing only with channels and punchlines make it a lot easier to help. These are robust and bug free. If something is not working well it will not take long to identify the issue, most often an infrastructure or misconfiguration issue.

At the end, the concept that are useful for providing support are useful for users and customers. Who wants to configure yam files, struggle with CI/CD, define templates ? To our view, all that is hell. Hence the punch.