Skip to content

5.1.2 to 5.2.0

This document explains what configuration changes MUST be performed during a PunchPlatform update from version 5.1.2 to 5.2.0

General changes

Elasticsearch _doc type

In view of the depreciation of the type from Elasticsearch 7 and the suppression from Elasticsearch 8. We wanted to make sure that if possible the type is _doc

Elasticsearch announcement

Indices created in Elasticsearch 6.0.0 or later may only contain a single mapping type.
Indices created in 5.x with multiple mapping types will continue to function as before in Elasticsearch 6.x.
Types will be deprecated in APIs in Elasticsearch 7.0.0, and completely removed in 8.0.0.
Source: removal of types

The new indices created by Punch now defaults to _doc type instead of 'doc'. An option allows to extend this value if necessary.

Changes to be performed

You are advised to update your platform elasticsearch indices templates so that your mappings apply to '_doc' type instead of 'doc'.

If you perform an update, and you did not specify explicetely the output document type at ElasticsearchBolt, ElasticsearchSpout, FileBolt (when archiving) level, then the document type will change, producing potential mapping exceptions if the output indice already exists (e.g. intraday during migration). This can be avoided by changing the target indice name in these component settings to a different naming pattern (at least during the migration process) - For ElasticsearchBolt, use 'index.prefix' per-stream setting. e.g. "index":{"type":"daily","prefix":"mytenant-events-b"} - For topology json 'Elasticsearch metrics reporter', use 'index_suffix' ; e.g. "index_suffix" : "metrics-b'

Platform tenant

To simplify certain tasks of the platform, like the housekeeping of the Beats, the platform tenant is now defined. It is mainly used to define housekeeping administration channels to manage the lifecycle of platform level monitoring indexes.

Removing services

The services concept, in charge of running the elastic and archives housekeepers, is now deprecated. Services are now plain channels. An "admin" channel is used instead, defined as part of a tenant.

Tenant configuration

In each tenant, the etc/conf.json file is simplified. Elasticsearch and Archives houseeking configurations moved to the channel in charge of running these services. See Removing_services).
Read official section to apply the changes.

Channel configuration

The channel_structure.json format changed and can now execute various types of jobs.
Read the official section to apply the changes.

PML

Dictionary Format

The array-oriented PML format is now deprecated and will be completly removed in the dave release. Convert your PMLs to a new dictionary format:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
{
  job: [
    {
      type: elastic_batch_input
      # ....  
    }
    {
      type: show
      # ....  
    }
  ]
}

You can also add more options such as spark_settings, tenant, runtime_id or a global description.

Stream simplification

The publish/subscribe stream settings have been simplified.

Instead of:

1
2
3
4
5
6
7
8
{
  "publish": [
    {
      "field": "bob",
      "tag": "input_transform"
    }
  ]
}

Now use:

1
2
3
4
5
6
7
8
{
  publish: [
    {
      stream: "input_transform",
      # optionally add an '"alias": "input_transform"' if you can not change stream name.
    }
  ]
}

Punchctl

A new punchctl command is available. It will replace the traditional punchplatform-*.sh operator commands.
Although traditional commands still exist, it is strongly recommended that you only use and document only the new punchctl command.

Deployer changes

Index prefix for beats

It is now possible to prefix the Beats (metricbeat, filebeat, ...) indices with a prefix of your choice.
The philosophy consists in prefixing the indices with the prefix platform- to allow the execution of the houskeeping tasks on these indices having the same prefix.

Ceph

A dedicated migration guide will be created.

Pygregator

A dedicated migration guide will be created.

Administration and Monitoring Changes

The 5.2.0 release provides significant improvements related to the monitoring and administration of channels and shiva. A new platform level index platform-logs-* is now used to trace all important administrative events. The platform-shiva-logs-* has been deprecated.

The events as well as the ones generated by Shiva child jobs are normalised and documented. This may require some change to your administrative Kibana dashboards should you have any.