Breaking changes in 6.0¶
This section discusses the changes that you need to be aware of when migrating your platform from a 5.x (CRAIG) to 6.x.
6.0.0 - Release Candidate
This version is a Release Candidate version.
Although the majority of changes have been made, significant changes may still occur.
This version, currently being validated, is an unstable version of which we would
appreciate being informed of any difficulties encountered.
The essential improvements and changes are the following :
- PML and topologies are now supported through a unique and generic format called punchlines.
- no more spout bolts topologies pml nodes etc.. only punchlines with nodes.
- python is now officially supported, punchlines can be executed using pyspark or spark.
- python nodes can be interleaved with java nodes.
- python 3 is a requirement
- pex packaging is used to distributed python punchline for execution
- Elastic has been upgraded to 7.x major releases
Make sure you go through the rest of this guide.
Requirements¶
The Punch platform is supported exclusively with:
- Java 8
- Python +3.6.8
- Ansible 2.9.0
OS supported¶
Some old OS are no longer officially supported. The installation can operate without warranty. Version 6.x focuses on the LTS OS between 2018 and 2020.
Officially supported:
- Ubuntu 18.04 LTS (EOL April 2023)
- Centos 8(EOL 2029)
- Debian 8 (EOL June 2020)
- Debian 9(EOL June 2022)
- Ubuntu 16.04 LTS (EOL April 2021)
- CentOS 7 (EOL 2024)
System daemon¶
systemd
becomes the single system daemon for components deployed in place of supervisord
.
Command lines¶
Starting at 6.0 the punch commands have been revamped. Use now channelctl
planctl
punchlinectl
.
channelctl
replaces the 5.xpunchctl
.planctl
lets you start a foreground plan, and provide operational commands to reset the plan cursors.punchlinectl
: lets you start a punchline whatever type it is (spark punch pyspark storm).sparkctl
is an internal client tool that is used to effectively submit spark punchlines and plans. Prefer going through punchlinectl that provides you with auto-completion and parameter checking.sparkctl
is meant to be an internal script.punchpkg
: is a new development tool to help you package your custom python nodes as part of the punch library of nodes.- this tool is a standalone tool only.
Important
On production system only channelctl
makes sense. The punchlinectl
tool is rather used by punchline developers.
The following commands have been deprecated:
punchplatform-channel.sh
: usechannelctl
instead.punchplatform-pyspark.sh
: already deprecated from 5.7.x. Usepunchlinectl
andplanctl
instead.punchplatform-analytics.sh
: already deprecated from 5.7.x. Usepunchlinectl
andplanctl
instead.punchplatform-topology.sh
: Usepunchlinectl
instead.punchctl
is replaced bychannelctl
. Note that the --job option has been replaced by --application.
reload¶
The 'reload' channel command has been disactivated in 6.0. Use only stop and start on the selected punchlines. This useful function will be provided back in 6.1 and subsequent release.
Channel Structure File Format¶
- The jobs property is replaced by applications.
- The version identifier is expected to be "6.0"
- Previous versions will be automatically translated at runtime. This should not prevent you to switch as quickly as possible to the new format.
Punchline Configuration and Format¶
The major improvements from 6.0 is to leverage a unique format for all sort of punchlines. No more topologies with spouts and bolts. Every punchline, storm or spark, stream or batch, can be expressed with the same new format.
Important
To ease migrations and upgrades, the old formats in particular topologies and 5.x pml formats are still supported. In addition the punchlinectl provides a convert option to automatically convert old style formats to the new ones.
Metrics update¶
- Metric
spark.punch.job.deploy.mode
is replaced byspark.punch.application.deploy.mode
- Metric
platform.job
is replaced byplatform.application
- Metric
job.deploy.mode
is replaced byapplication.deploy.mode
- Metric
job.recovery.time_zone
is replaced byapplication.recovery.time_zone
Components¶
Elastic upgrade¶
Into this Punchplatform version, we upgraded Elastic components to 7 version which introduced new features and some breaking changes with Elastic 6
Especially, mapping type depreciation (_type
metadata field). Consequently we removed type
parameter from all nodes which interacted with Elasticsearch
such as Elastic_output node..
You can find the full list of breaking changes between these two Elastic major version : Breaking changes 7.0
Storm Punchlines¶
Used Defined Nodes¶
The support of user defined storm nodes has been significantly improved. Refer to the new Storm Punchline User Node chapter.
Starting at 6.1, the way you provide your node is completely different. Instead of providing an uber jar that required you to encapsulated all punch dependencies, you now only provide a lightweight jar with only your classes.
This is however breaking and you must change both your custom node maven projects, and the punchline configurations.
Spark Punchlines¶
Pyspark nodes uses pex
as the single node package.
punchlinectl
is the main analytics command.
Pyspark¶
Pyspark package refactored to punchline_python and thus using our public API for developing custom node should now
be prefixed with punchline
as root module.
# old in 5.x or earlier
from core.holders.input_holder import InputHolder
# new in 6.0 or later
from punchline_python.core.holders.input_holder import InputHolder
Modules:
We no longer brings modules like numpy, pandas, scikit.learn etc... by default. Reason: we don't want to prevent users from upgrading to these modules latest release.
# An example how to write custom nodes and include your custom modules with our new cli punchpkg
https://github.com/punchplatform/starters/tree/6.x/pyspark/custom_node_python
Punch Kibana Plugin¶
The Punch Kibana plugin now uses the REST API exposed by the punch REST Gateway. The configuration of kibana (kibana.yaml), of punchplatform.properties has changed.
Rest API Gateway¶
The rest api's routes use different endpoints for tenant's services :
- Elasticsearch forwarding:
/es/<es_cluster_id>/*
- Punchline executions:
/punchline
/punchline/save
/punchline/scan
/punchline/<punchline_id>
/punchline/<punchline_id>/executions/<execution_id>
/punchline/executions
/punchline/executions/<execution_id>
- Registry for resource hosting:
/registry
/download/<resource_name>
/metadata/<resource_name>
Configuration Changes¶
This chapter list all configuration changes in details.
punchplatform.properties¶
We are taking advantage of the major update to modify the configuration in the punchplatform.properties.
Components:
Elasticsearch
Kibana
Rest API Gateway
Elasticsearch¶
Opendistro Security plugin's configuration section has been changed. Some keys are different, and SSL/TLS new configuration has been added :
{
"plugins":{
"opendistro_security": {
"local_ssl_certs_dir": "/tmp/certs/es/server1",
"ssl_transport_pemkey_name": "node-key-pkcs8.pem",
"ssl_transport_pemcert_name": "node-cert.pem",
"ssl_transport_pemtrustedcas_name": "rootca-cert.pem",
"admin_pemcert_name": "admin-cert.pem",
"admin_pemkey_name": "admin-key.pem",
"admin_pemtrustedcas_name": "cachain.pem",
"authcz_admin_dn": ["emailAddress=admin@thalesgroup.com,CN=admin,OU=SAS,O=TS,L=VLZ,ST=Paris,C=FR"],
"nodes_dn": ["emailAddress=node@thalesgroup.com,CN=node,OU=SAS,O=TS,L=VLZ,ST=Paris,C=FR"],
"kibana_index": ".kibana-domainname"
}
}
}
Info
Note that local_ssl_certs_dir
is now inside punchplatform.properties file and not inside
punchplatform-deployment.settings anymore
Kibana¶
Punchplatform plugin is now configurable with the Rest API's hosts, and do not need all the previous configuration anymore:
{
"plugins":{
"punchplatform": {
"rest_api": {
"hosts": ["https://server1:4242/v1"]
}
}
}
}
Opendistro Security plugin's configuration section has been removed.
New SSL/TLS configurations :
{
"kibana" : {
"domains" : {
"mydomain" : {
"local_ssl_certs_dir": "/tmp/certs/kibana/mydomain",
"server_ssl_enabled": true,
"server_ssl_key_name": "kibana-server-key.pem",
"server_ssl_certificate_name": "kibana-server-cert.pem",
"elasticsearch_ssl_enabled": true,
"elasticsearch_ssl_verificationMode": "none",
"elasticsearch_ssl_certificateAuthorities_names": ["server-cachain.pem"]
}
}
}
}
Plugins can be configured :
- Globally, inside the kibana section
- Locally, inside each domain section, overriding global configurations
Info
Note that local_ssl_certs_dir
is now inside punchplatform.properties file and not inside
punchplatform-deployment.settings anymore. It is also configurable per domain.
Rest API Gateway¶
The gateway configuration is now providing most of the old Punch Kibana plugin configuration, with additional conf related to new Rest API's routes :
{
"gateway": {
"clusters": {
"cluster1": {
"tenant": "mytenant",
"modsecurity_enabled": false,
"servers": {
"server1": {
"inet_address": "172.28.128.21",
"port": 4242,
"ssl": {
"enabled": true,
"local_key_store_path": "/tmp/jks/gateway.keystore",
"key_store_type": "jks",
"key_store_password": "gateway",
"key_alias": "gateway",
"key_password": "gateway"
}
}
},
"elasticsearch": {
"data_cluster": {
"cluster_id": "es_search",
"hosts": ["server2:9200"],
"settings": ["es.index.read.missing.as.empty: yes", "es.nodes.discovery: true"],
"ssl_enabled": true
},
"metric_cluster": {
"cluster_id": "es_search",
"hosts": ["server2:9200"],
"index_name": "mytenant-metrics",
"settings": ["es.index.read.missing.as.empty: yes", "es.nodes.discovery: true"],
"ssl_enabled": true
}
},
"services": {
"extraction": {
"formats": ["csv", "json"]
},
"registry": {
"type": "file",
"settings": {
"root_path": "/home/vagrant/pp-conf/tenants",
"create_root": true
}
}
},
"resources": {
"resources_dir": "/home/vagrant/pp-conf/resources",
"doc_dir": "/data/opt",
"tmp_dir": "/tmp",
"archives_dir": "/tmp"
},
"reporters": [
{
"type": "elasticsearch",
"cluster_name": "es_search",
"index_name": "mytenant-gateway-logs",
"credentials": {
"user": "admin",
"password": "admin"
},
"ssl_enabled": true
}
]
}
}
},
}
punchplatform-deployment.settings¶
We are taking advantage of the major update to modify the configuration in the punchplatform-deployment.settings. Details will be added gradually.