This chapter provides the high level roadmap together with an indicative agenda. Make sure to contact the punch team for details about any item, or if you would like to see a missing item appear.
Remember ou conventions to identify a release or a release candidate.
- Alpha : is a non-production-ready early release. It introduces some of the new features, but other new features will be added later on. APIs and interface may not be stable. Alpha release are provided for pre-production, evaluation or innovation work.
- Beta : is a release with all new features in. No new feature will be added. It is still under validation. APIs and interface are stable.
- Release : is a production-ready release.
- EOL means End of Life : No one should start a new project with an EOL release. It simply means newer releases are production ready and available.
- EOS means End of Support : too old releases are not supported anymore by the punch level 3 support team. Anticipate your migration, the punch team is there to help.
Here is what it looks like in our small ascii quick view: each '|' indicates the time at which an (alpha|beta|release) is released on our web site.
Release Alpha Beta Release end-of-life end-of-support 5.5 alpha| beta| rel| eol| eos|
Here is the current release agenda.
2020 | | | |2021 | | | |2022 | | | | Q1| Q2| Q3| Q4| Q1| Q2| Q3| Q4| Q1| Q2| Q3| Q4| Mar| Jun| Sep| Dec| Mar| Jun| Sep| Dec| Mar| Jun| Sep| Dec| | | | | | | | | | | | | 5.5 eol| | | | eos| | | | | | | | 5.6 eol| | | | eos| | | | | | | | 5.7 | | eol| | | | | eos| | | | | 6.0 beta| rel| eol| | eos| | | eol| | | | | 6.1 | beta| rel| | | | | eol| | | | | 7.0 | | alpha| beta| rel| | | | | | | eos|
The dave 6.x release is built on top of elasticsearch 7.x. It still runs on top of jdk 1.8.
The dilemna is to wait for Spark 3.0 and Elastic 8.x and hadoop-elastic connectors to be all ready and running on top of jdk 11. If this happens quickly, the Dave release will be only a temporary release.
The Dave 6.1 has been defined in June/2020. Several COTS evolutions and convergence issues with corporate initiatives let us defined a new target for our stable 6.x releases. In particular :
- Kafka soon releases a version with no more zookeeper dependency. We decided to also get rid of our zookeeper dependency and leverage a new design on top of Kafka only. In turn this dramatically simplifies the punch
- Spark3/Kubernetes : spark 3 has been released in June/2020. It is now more mature both on python capabilities and easy to work with on top of a kubernetes Cluster. We decided to make Spark3 appear in the scope of the 6.x releases.
- this however will be developed without impact nor risks to customers running spark 2.4 punchlines.
- Training and documentation material : we are working on improving significantly our online material so as to train efficiently new customers. This revamp goes along with many new features developped as part of the Thales 2020 innovation funding. Because the 6.0.x releases are in production, we decided to work on that revamp starting at 6.1.
For all these reasons, we plan to make all 6.0.x customers migrate to the 6.1.x releases, and officially deprecate the 6.0.x.
The 7.x releases are built on top of Java 11, Spark3.0 and elasticsearch 8.x. A beta 7.0.0 release is planned to be released Q1.
The 7.x release will largely leverage kubernetes, although a kubernetes-free release is still planned so as to accomodate tiny deploiments and/or existing customers.
The features high-level agenda is as follows. Refer to each description below for details about each feature.
2020 | | | | Q1| Q2| Q3| Q4| Mar| Jun| Sep| Dec| PunchlinesOnly alpha| rel| PunchResolver alpha| rel| ContinerdImages alpha| beta| rel| Minio beta| rel| ClickHouse beta| | rel| ShivaImprovments alpha| rel| RestApiGateway beta| alpha| rel| RestApiServer beta| alpha| rel| ParserCatalog beta| alpha| rel| Nifi beta| alpha| rel| MlLifefCycle alpha| beta| rel| FeedBackGui alpha| beta| rel| CloudServices alpha| beta| rel| DataProtectionAccess alpha| beta| rel| Spark3 alpha| beta| rel| Dask beta| rel| CorrelationRuleEngine beta| rel| CorrelationRuleIhm alpha| beta| rel|
Starting at 6.x all jobs (storm punch spark python) are expressed as punchlines. This has important benefits in particular on the HMI and front side as all punch applications will be editable from the punchline editor.
The punchline schema is, in fact, the one of 5.x PML files. We decided to leverage that schema for all punch application. This said, old style format will be fully supported for backward compatibility. Refer to the migration guide.
The punch resolver isolates user from low level security or access point configuration. It is required to deploy application on secured multi-tenant platforms so as to allow functional users to be relieved and protected from low level configuration items.
It is a significant improvement to the existing templating based puynch approach.
The Punch is highly modular is often deployed on top of existing managed services. Starting at version 6.1, Punch API Gateway, Kibana plugins are provided as container images. You can deploy these directly into your docker rancher or k8 infrastructure.
The S3 minio store has been added in the 6.0 punch.
Clickhouse is being evaluated. Punch already provides the required IO connectors (output nodes).
Shiva is now running production and has turned out a key service to run big data, log management or machine learning use case on on-premise infrastructure. We decided to improve Shiva with a few simple but important capabilities:
- resource aware scheduling
- smooth upgrade service restarts
- one shot application execution
For several reasons an API gateway is required.
- Security : in order to isolate Kibana and external customers to directly access database or sensitive services such as Zookeeper, Elasticsearch, Clickhouse.
- Services : in order to provide all Punch services like starting, stopping a channel, submitting a punchline ML application for execution.
- Protection : in order to analyse the user submitted requests and take actions so as to protect user(s) from dangerous actions.
- LifeCycle : enrichment files, parsers and machine learning models are examples of important resources deployed by users in their punchlines. Managing these together with their versionning and lifecycle requires a REST server and a well designed REST API and schema.
A particularly important REST server use case is to expose resources such as parsers, models and enrichment files. This is for example used by customer to automatically upload new enrichment files, automatically reloaded in running punchlines without service interruption.
The resource-server is the backend that provide these service. The Punch rest server leverages elasticsearch, S3 and a posix file system to store these resources.
This item is in fact more than a server, it includes: 1. a REST API 2. REST API libraries 3. an actual server
The punch provides a number of ready to use parsers. The new ParserCatalog will expose these through an improved GUI in order for cybersecurity user to view, play, or deploy the parser into production.
Starting at 6.x punch provides a nifi processor so as to run parsers in a Nifi runtime. This processor is improved in several ways so as to provide the same rich set of features than in the punch or storm engines.
Ml Life Cycle¶
The Mlflow component is integrated and provides dedicated and additional facilities to manage machine learning models. Note that the models themselves are stored in the Punch resource catalog.
The so-called feedback Gui will provide users with an ergonomic way to interact with data produced by machine learning applications. It will let them label the data, and identify false-positive, in a way for the overall application to recalibrate continuously the models.
Punch deployed on Cloud will benefit from docker image punch services (already cited in a separate feature). What is required is also some easier ready-to-use service for :
- creating a tenant
- deploying automatically the Kibana plugins and punch Guis
- Name and expose the customer data, as well as some of its internal monitoring data
- automatically wire the customers security RBAC attributes.
Theses are referred to as Cloud services.
Data Protection Access¶
One of the Punch REST Gateway service consists in exposing inner elasticsearch data directly through a REST API. This requires however safe strategies so as to protect users from submitting dangerous queries to the backend (elasticsearch or clickhouse for instance). This item consists in using machine learning strategies to improve the classification or user requests, in turn allowing the Punch gateway to accept or refuse the incoming requests.
Spark3 will be fully integrated and offered in alpha|beta release quickly so as to allow upfront prototyping and testing by the various data science teams.
the Dask distributed python engine is planned for integration in the punch.
The punch already includes the ElastAlert engine. This integration has been fully revamped so as to make elastalert a puch service like any other, executed in shiva and soon as easily in kubernetes.
We plan to leverage the elastalert yaml rule format to provide additional capabilities such as machine learning alerting.
A rule engine IHM is planned to allow end-users to edit, save and load their rule from the IHM. This IHM will be fully integrated with the RBAC protections.