The Punch lets you define you complete platform and applications using simple configuration files. These files are structured in a simple and straight per tenant then per channel layout.
This chapter explains the overall configuration structure.
Every application is defined as part of a tenant. In each tenant, you further organize your processing units into channels. A channel is a thus a set of applications that you can start or stop alltogether.
More precisely Punch applications can be:
- punchlines: these cover streaming or batch data processing pipelines. The punch supports various runtime engines for these.
- Additional ready-to-use punch application, for example for taking care of data housekeeping.
- Third-party applications such as ElastAlert or logstash, fully integrated for you in the Punch.
- Your own applications.
Here is the overall view of how the punch allows a user to submit different types of applications to various processing engines.
- is a (typically) shared folder equipped with revision control capabilities. This repository contains the platform and application structural configuration files. It does not conntain the actual functions (parsers, jars, python files etc) that contain the application logic.
- represents an administrative console for the sake of illustrating how a terminal user can start an application.
- are (typical) examples of applications launched and managed by the punch.
- a punch native punchline used for streaming use cases
- a spark punchline used for batch or streaming analytics processing
- a third-party application. An example is a logstash process, it could be your own.
On Kubernetes, punch applications are directly submitted as containers. On native punch an equivalent orchestrator is used (shiva) but the principles are identical.
Here is the layout of a punch platform configuration tree.
└── conf ├── resources │ ├── elasticsearch │ │ ├── templates │ │ └── example_requests │ └── kibana │ ├── dashboards │ ├── cyber │ ├── other │ └── platform └── tenants ├── customer1 │ └── channels │ ├── admin │ │ ├── channel_structure.yml │ │ └── housekeeping_punchline.yml │ └── apache_httpd │ ├── archiving_punchline.yml │ ├── channel_structure.yml │ └── parsing_punchline.yml └── platform └── channels ├── housekeeping │ ├── channel_structure.yml │ └── elasticsearch-housekeeping.json └── monitoring ├── channels_monitoring.json ├── channel_structure.yml ├── local_events_dispatcher.yml └── platform_health.json
confis a sample folder to illustrate what you, the platform user, see and understand. The location of that folder is defined by the $PUNCHPLATFORM_CONF_DIR environment variable.
tenantscontains the per-tenant configurations. Remember everything is defined in a tenant.
- the reserved
platformtenant is used for platform level applications. Some monitoring or housekeeping tasks are typically defined at that level. Typically only administrators have access to that tenant and the related resources (i.e. elastic indices; kafka topics, Kibana or UI servers etc).
- here the
customer1tenant is a fictive end user example.
- the reserved
channelsall applications are grouped in ways invented by users and organised insides channels.
channel_structure.ymleach channel (for instance 'admin' or 'apache_httpd') is defined using this file. It basically defines the channel content.
punchlines.ymlindividual applications are defined by punchlines.
Each punchline reference additional importants artefacts such as punch parsers or functions (i.e. punchlets), resource files (for data enrichment use case), user binary artefacts (pex or jar files). These are typically installed separately using an artefact package manager such as maven. This is explained hereafter. Here is how a punchline reference such functions:
resources: - punch-parsers:org.thales.punch:punch-core-parsers:1.0.0 - punch-java-node:com.yourcompany.nodes:alert-nodes:1.0.0
To sum up:
- a configuration tree holds the complete definition of applications and how they are grouped and orchestrated together. Most of punch applications are punchlines that internally reference functions and resources.
- these functions and resources (parsers, arbitrary functions, ml models, nodes etc..) are delivered as additional packages and libraries through
an artefact repository. That repository exposes a robust and standard
- In addition to the punch on-the-shelves functions, you can provide your own as long as they are provided as standard packages (i.e. a maven artefact for java).
- In addition to the punch on-the-shelves applications, you can provide your own as long as they are provided as containers.
Development Configuration Tree¶
Going through an artefact repository as just explained is robust, production-ready and can be integrated to your CI/CD.
This said it is handy and easier to work directly with plain files for testing or developing. Here is a view of a development punch (in fact the punch standalone package). It comes equipped with local punchlets.
└── conf ├── resources │ ├── elasticsearch │ │ ├── templates │ │ └── example_requests │ └── kibana │ ├── dashboards │ ├── cyber │ ├── other │ └── platform └── tenants ├── customer1 │ └── channels │ │ ├── admin │ │ │ ├── channel_structure.yml │ │ │ └── housekeeping_punchline.yml │ │ └── apache_httpd │ │ ├── archiving_punchline.yml │ │ ├── channel_structure.yml │ │ └── parsing_punchline.yml │ └── resources │ └── punch │ └── org │ └── thales │ └── punch │ ├── apache_httpd │ │ ├── enrichment.punch │ │ ├── http_codes.json │ │ ├── normalization.punch │ │ ├── parsing.punch │ │ └── taxonomy.json │ └── common │ ├── geoip.punch │ ├── head_parser.punch │ └── syslog_header_parser.punch └── platform ├── channels │ ├── housekeeping │ │ ├── channel_structure.yml │ │ └── elasticsearch-housekeeping.json │ └── monitoring │ ├── channels_monitoring.json │ ├── channel_structure.yml │ ├── local_events_dispatcher.yml │ └── platform_health.json └── resources └── punch └── org └── thales └── punch └── monitoring_dispatcher.punch
The only additional files are located under each tenant 'resources/punch' folder. In there you find a bunch of parsers and resource files. When you have such a layout, the local punchlets will be used instead of the on delivered from the artefact repository.
The tenant/resource/punch layout is strictly equivalen to the artefact repository layout used in production. If that layout works locally, it will work in production as long as you deliver your packages using the same groupId/artefactId tree structure. Here org/thales/punch/apache_httpd.
The punch takes care of propagating the various peices and parts at application startup.
On kubernetes, init containers are automatically generated to fetch all the required files before the application is actually started. These special containers are in charge of fetching these files and make them locally available under an "/opt/punch/repository" local filesytem available for every starting container.
On legacy punch the same pattern is implemented using the shiva Kafka backbone.