Skip to content

Breaking changes in 6.0

This section discusses the changes that you need to be aware of when migrating your platform from a 5.x (CRAIG) to 6.x.

6.0.0 - Release Candidate

This version is a Release Candidate version.
Although the majority of changes have been made, significant changes may still occur. This version, currently being validated, is an unstable version of which we would appreciate being informed of any difficulties encountered.

The essential improvements and changes are the following :

  • PML and topologies are now supported through a unique and generic format called punchlines.
    • no more spout bolts topologies pml nodes etc.. only punchlines with nodes.
  • python is now officially supported, punchlines can be executed using pyspark or spark.
    • python nodes can be interleaved with java nodes.
    • python 3 is a requirement
    • pex packaging is used to distributed python punchline for execution
  • Elastic has been upgraded to 7.x major releases

Make sure you go through the rest of this guide.

Requirements

The Punch platform is supported exclusively with:

  • Java 8
  • Python +3.6.8
  • Ansible 2.9.0

OS supported

Some old OS are no longer officially supported. The installation can operate without warranty. Version 6.x focuses on the LTS OS between 2018 and 2020.

Officially supported:

  • Ubuntu 18.04 LTS (EOL April 2023)
  • Centos 8(EOL 2029)
  • Debian 8 (EOL June 2020)
  • Debian 9(EOL June 2022)

Deprecated:

  • Ubuntu 16.04 LTS (EOL April 2021)
  • CentOS 7 (EOL 2024)

System daemon

systemd becomes the single system daemon for components deployed in place of supervisord.

Command lines

Starting at 6.0 the punch commands have been revamped. Use now channelctl planctl punchlinectl.

  • channelctl replaces the 5.x punchctl.
  • planctl lets you start a foreground plan, and provide operational commands to reset the plan cursors.
  • punchlinectl : lets you start a punchline whatever type it is (spark punch pyspark storm).
  • storectl : provides you with an interactive tool to query the archived data.
  • sparkctl is an internal client tool that is used to effectively submit spark punchlines and plans. Prefer going through punchlinectl that provides you with auto-completion and parameter checking. sparkctl is meant to be an internal script.
  • punchpkg : is a new development tool to help you package your custom python nodes as part of the punch library of nodes.
    • this tool is a standalone tool only.

Important

On production system only channelctl makes sense. The punchlinectl tool is rather used by punchline developers.

The following commands have been deprecated:

  • punchplatform-channel.sh : use channelctl instead.
  • punchplatform-pyspark.sh : already deprecated from 5.7.x. Use punchlinectl and planctl instead.
  • punchplatform-analytics.sh : already deprecated from 5.7.x. Use punchlinectl and planctl instead.
  • punchplatform-topology.sh : Use punchlinectl instead.
  • punchplatform-archive-client.sh : use storectl instead.
  • punchctl is replaced by channelctl. Note that the --job option has been replaced by --application.

reload

The 'reload' channel command has been disactivated in 6.0. Use only stop and start on the selected punchlines. This useful function will be provided back in 6.1 and subsequent release.

Channel Structure File Format

  • The jobs property is replaced by applications.
  • The version identifier is expected to be "6.0"
  • Previous versions will be automatically translated at runtime. This should not prevent you to switch as quickly as possible to the new format.

Punchline Configuration and Format

The major improvements from 6.0 is to leverage a unique format for all sort of punchlines. No more topologies with spouts and bolts. Every punchline, storm or spark, stream or batch, can be expressed with the same new format.

Important

To ease migrations and upgrades, the old formats in particular topologies and 5.x pml formats are still supported. In addition the punchlinectl provides a convert option to automatically convert old style formats to the new ones.

Metrics update

  • Metricspark.punch.job.deploy.mode is replaced byspark.punch.application.deploy.mode
  • Metricplatform.job is replaced byplatform.application
  • Metricjob.deploy.mode is replaced byapplication.deploy.mode
  • Metricjob.recovery.time_zone is replaced byapplication.recovery.time_zone

Components

Elastic upgrade

Into this Punchplatform version, we upgraded Elastic components to 7 version which introduced new features and some breaking changes with Elastic 6

Especially, mapping type depreciation (_type metadata field). Consequently we removed type parameter from all nodes which interacted with Elasticsearch such as Elastic_batch_input node, Elastic_batch_output node..

You can find the full list of breaking changes between these two Elastic major version : Breaking changes 7.0

Storm Punchlines

Used Defined Nodes

The support of user defined storm nodes has been significantly improved. Refer to the new Storm Punchline User Node chapter.

Starting at 6.1, the way you provide your node is completely different. Instead of providing an uber jar that required you to encapsulated all punch dependencies, you now only provide a lightweight jar with only your classes.

This is however breaking and you must change both your custom node maven projects, and the punchline configurations.

Spark Punchlines

Pyspark nodes uses pex as the single node package.
punchlinectl is the main analytics command.

Pyspark

Pyspark package refactored to punchline_python and thus using our public API for developing custom node should now be prefixed with punchline as root module.

# old in 5.x or earlier
from core.holders.input_holder import InputHolder
# new in 6.0 or later
from punchline_python.core.holders.input_holder import InputHolder

Modules:

We no longer brings modules like numpy, pandas, scikit.learn etc... by default. Reason: we don't want to prevent users from upgrading to these modules latest release.

# An example how to write custom nodes and include your custom modules with our new cli punchpkg
https://github.com/punchplatform/starters/tree/6.x/pyspark/custom_node_python

Punch Kibana Plugin

The Punch Kibana plugin now uses the REST API exposed by the punch REST Gateway. The configuration of kibana (kibana.yaml), of punchplatform.properties has changed.

Rest API Gateway

The rest api's routes use different endpoints for tenant's services :

  • Elasticsearch forwarding: /es/<es_cluster_id>/*
  • Punchline executions:
  • /punchline
  • /punchline/save
  • /punchline/scan
  • /punchline/<punchline_id>
  • /punchline/<punchline_id>/executions/<execution_id>
  • /punchline/executions
  • /punchline/executions/<execution_id>
  • Registry for resource hosting:
  • /registry
  • /download/<resource_name>
  • /metadata/<resource_name>

Configuration Changes

This chapter list all configuration changes in details.

punchplatform.properties

We are taking advantage of the major update to modify the configuration in the punchplatform.properties.

Components:

  • Elasticsearch
  • Kibana
  • Rest API Gateway

Elasticsearch

Opendistro Security plugin's configuration section has been changed. Some keys are different, and SSL/TLS new configuration has been added :

{
  "plugins":{
    "opendistro_security": {
      "local_ssl_certs_dir": "/tmp/certs/es/server1",
      "ssl_transport_pemkey_name": "node-key-pkcs8.pem",
      "ssl_transport_pemcert_name": "node-cert.pem",
      "ssl_transport_pemtrustedcas_name": "rootca-cert.pem",
      "admin_pemcert_name": "admin-cert.pem",                
      "admin_pemkey_name": "admin-key.pem",                  
      "admin_pemtrustedcas_name": "cachain.pem",
      "authcz_admin_dn": ["emailAddress=admin@thalesgroup.com,CN=admin,OU=SAS,O=TS,L=VLZ,ST=Paris,C=FR"],
      "nodes_dn": ["emailAddress=node@thalesgroup.com,CN=node,OU=SAS,O=TS,L=VLZ,ST=Paris,C=FR"],
      "kibana_index": ".kibana-domainname"
    }
  }
}

Info

Note that local_ssl_certs_dir is now inside punchplatform.properties file and not inside punchplatform-deployment.settings anymore

Kibana

Punchplatform plugin is now configurable with the Rest API's hosts, and do not need all the previous configuration anymore:

{
  "plugins":{
    "punchplatform": {
     "rest_api": {
        "hosts": ["https://server1:4242/v1"]
      }
    }
  }
}

Opendistro Security plugin's configuration section has been removed.

New SSL/TLS configurations :

{
  "kibana" : {
    "domains" : {
      "mydomain" : {
        "local_ssl_certs_dir": "/tmp/certs/kibana/mydomain",
        "server_ssl_enabled": true,
        "server_ssl_key_name": "kibana-server-key.pem",
        "server_ssl_certificate_name": "kibana-server-cert.pem",
        "elasticsearch_ssl_enabled": true,
        "elasticsearch_ssl_verificationMode": "none",
        "elasticsearch_ssl_certificateAuthorities_names": ["server-cachain.pem"]
      }
    }
  }
}

Plugins can be configured :

  • Globally, inside the kibana section
  • Locally, inside each domain section, overriding global configurations

Info

Note that local_ssl_certs_dir is now inside punchplatform.properties file and not inside punchplatform-deployment.settings anymore. It is also configurable per domain.

Rest API Gateway

The gateway configuration is now providing most of the old Punch Kibana plugin configuration, with additional conf related to new Rest API's routes :

{
  "gateway": {
    "clusters": {
      "cluster1": {
        "tenant": "mytenant",
        "modsecurity_enabled": false,
        "servers": {
          "server1": {
            "inet_address": "172.28.128.21",
            "port": 4242,
            "ssl": {
              "enabled": true,
              "local_key_store_path": "/tmp/jks/gateway.keystore",
              "key_store_type": "jks",
              "key_store_password": "gateway",
              "key_alias": "gateway",
              "key_password": "gateway"
            }
          }
        },
        "elasticsearch": {
          "data_cluster": {
            "cluster_id": "es_search",
            "hosts": ["server2:9200"],
            "settings": ["es.index.read.missing.as.empty: yes", "es.nodes.discovery: true"],
            "ssl_enabled": true
          },
          "metric_cluster": {
            "cluster_id": "es_search",                             
            "hosts": ["server2:9200"],
            "index_name": "mytenant-metrics",
            "settings": ["es.index.read.missing.as.empty: yes", "es.nodes.discovery: true"],
            "ssl_enabled": true
          }
        },
        "services": {
          "extraction": {
            "formats": ["csv", "json"]
          },
          "registry": {
            "type": "file",
            "settings": {
              "root_path": "/home/vagrant/pp-conf/tenants",
              "create_root": true
            }
          }
        },
        "resources": {
          "resources_dir": "/home/vagrant/pp-conf/resources",
          "doc_dir": "/data/opt",
          "tmp_dir": "/tmp",
          "archives_dir": "/tmp"
        },
        "reporters": [
          {
            "type": "elasticsearch",
            "cluster_name": "es_search",
            "index_name": "mytenant-gateway-logs",
            "credentials": {
              "user": "admin",
              "password": "admin"
            },
            "ssl_enabled": true
          }
        ]
      }
    }
  },
}

punchplatform-deployment.settings

We are taking advantage of the major update to modify the configuration in the punchplatform-deployment.settings. Details will be added gradually.