Skip to content

4.x to 5.x

This document explains what are the mandatory configuration changes to be performed during a PunchPlatform update from version Brad (4.0.x) to Craig (5.x).

Overview

The Craig version relies on Elasticsearch/Kibana/Beats from the Elastic 6.4.0 stack. The induced breaking changes are the following:

  1. Elastic indexes do not support several types anymore. Multi type indexes were used to store the punchplatform metrics. Starting from Craig, all indexes including the ones used for metrics have to be defined for a single type.
  2. To anticipate the soon deprecation of transport and node elasticsearch client libraries, the Craig components (bolts, reporters, injectors) only rely on the high level REST elastic library. That means that all the indexing traffic passes through the http port (typically 9200). It means Elasticsearch cluster must be updated to support queries format from updated Elasticsearch clients (e.g. Punchplatform channels).
  3. Indices templates that provide the fields mappings (indexing options) have to be reworked for single-type document per indice (and should include replacementof deprecated template single-value string attribute by index_patterns array of patterns).

In addition, channels configuration changes have occurred (details are in following parts of this document). Mainly :

  • Move of most configuration from "shared" to "per tenant" (e.g. channels templates, parsers)
  • Change of Storm stream configuration from implicit to explicit for metrics and error documents management
  • Change in settings of Spouts and Bolts

Punchplatform Properties Configuration


Shiva section

Most shiva configuration items moved from the punchplatform-deployment.settings to the punchplatform.properties. Update these two files after reading the shiva sections in ADMINSITRATION GUIDE/CONFIGURATION GUIDE

Metricbeat section

In craig, the administrator can configured the reporting interval per metricset for the same module. This feature implies the following breaking change in the metricbeat configuration:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
"metricbeat" : {
    "modules" : {
        "system" : {
            "high_frequency_system_metrics": {
                "metricsets" : ["cpu","load","memory"],
                "reporting_interval" : "30s"
            },
            "normal_frequency_system_metrics": {
                "metricsets" : ["fsstat"],
                "reporting_interval" : "5m"
            },
            "slow_frequency_system_metrics": {
                "metricsets" : ["uptime"],
                "reporting_interval" : "1h"
            }
        }
    },
    "kafka" : {
        "cluster_id" : "local",
        "topic_name" : "platform-system-metrics"
    }
}

Punchplatform Deployment Configuration

Kafka brokers settings

Kafka brokers JVM memory size is now configurable for each cluster. So instead of globally configuring it in punchplatform-deployment.settings like :

1
"kafka_brokers_jvm_xmx": "256M"

You should now use the punchplatform.properties file as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
"kafka" : {
  "clusters" : {
      "local" : {
          "brokers" : ["node01:9092", "node02:9092", "node03:9092"],
          "zk_cluster" : "common",
          ...
          "kafka_brokers_jvm_xmx": "256M"
      }
  },
  "install_dir" : "/data/opt/kafka_2.11-0.11.0.0"
}

Channels structure

Two sections should be removed from channel_structure.json files within each channel configuration folder (they will be silently ignored) :

  • metrics
  • autotest_latency_control

Topology File Configuration

Spouts metrics activation

To activate the [self_monitoring] option in a Spout, the self_monitoring.frequency parameters has been renamed to self_monitoring.period

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
{
  ...
  "spouts": [
    {
      "type": "syslog_spout",
      "spout_settings": {
        "listen": {
          ...
        },
        "self_monitoring.activation": true,
        "self_monitoring.period": 10
      },
      "storm_settings": {
        ...
      }
    }
  ],
  ...
}

Metric reporters

In topologies, the metrics reporter section must now be an array containing maps of reporters. Typically, it will now look like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
"metrics" : {
  "reporters" : [
    {
      "type" : "elasticsearch",
      ...
    },
    {
      "type" : "kafka",
      ...
    }
  ]
}

Elasticsearch Reporter

The Elasticsearch reporter section (if any) must now contains only the target cluster identifier.

1
2
3
4
5
6
7
8
"metrics" : {
  "reporters" : [
    {
      "type" : "elasticsearch",
      "cluster_id" : "es_search"
    }
  ]
}

cluster_name deprecation

cluster_id is now used instead of [cluster_name]. cluster_name will be accepted for backwardcompatibility. It is also allowed (but not encouraged) to explicitly set the http urls as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
  "metrics": {
    "reporters": {
      "elasticsearch_metric_reporter": {
        "type": "elasticsearch",
        "http_hosts": [
          {
            "host": "host1",
            "port": 9200
          },
          {
            "host": "host2",
            "port": 9200
          }
        ]
      }
    }
  }

Warning

the [nodes] [max_results_size] and [native_es_settings] are deprecated and will be ignored silently.

Slf4j reporter

The slf4j reporter has been renamed to be more generic. Please, now use the reporter instead.

1
2
3
4
5
6
7
8
"metrics" : {
  "reporters" : [
    {
      "type" : "logger",
      "format" : "json"
    }
  ]
}

Note

See Metrics to get a full description of available options.

Elasticsearch Bolt

The Elasticsearch Bolt has been simplified and has additional settings options (cf. bolt documentation.

1) It used to have one output configuration for the "nominal" documents, and an other one for "error" documents, with different specific parameters. Now it can be configured with a distinct output configuration for each one of any number of storm streams, without specific "errors" management behaviour.

Danger

error_index parameter is needed to indicate where documents must be inserted in case of insertion rejection by Elasticsearch (e.g. Mapping exception). Not providing this parameter will cause information about rejection to be dropped, resulting in data loss.

2) It now optionnally accepts a list of target http endpoints instead of relying on the cluster_id property.

I.e instead of using:

1
2
3
4
5
6
7
8
{
  "type" : "elasticsearch_bolt",
  "bolt_settings" : {
    "cluster_id" : "es_search",
    ...
  }
  ...
}

you can instead use:

1
2
3
4
5
6
7
8
{
  "type" : "elasticsearch_bolt",
  "bolt_settings" : {
    "http_hosts" : : [ { "host" : "host1", port : 9200}, { "host" : "host2", port : 9200} ],
    ...
  }
  ...
}

3) queue_size setting does not exist anymore for ElasticsearchBolt. In BRAD, ES Bolt was synchronously sending a batch AFTER the previous one succeeded. In CRAIG, ES Bolt asynchronously sends batches as soon as they are full. Depending on your topology max_spout_pending, a lot of batches may therefore be concurrently submitted to the ES indexation queue, increasing reisk of a request timeout of some of them if ES is loaded. Take care to size your request_timeout in accordance to avoid Storm tuple failures at ES Bolt level and replays by the topology, or reduce your max_spout_pending so that less batches are concurrently submitted.

The KafkaSpout relies on the new internal Kafka library. There are a few configuration changes:

  • fetch.max.bytes is deprecated, you must now use receive.buffer.bytes
    • this property indicates the amout of data fetched at each kafa poll operation
    • the default value is now 64K

Kafka Bolt

The KafkaBolt now exclusively relies on the kafka producer library settings for batching several writes as part of a single IO operations.

In addition the KafkaBolt does not support anymore the embedding of several incoming message as part of a single Kafka message. This was a sort of applicative bulk strategy that is now unsupported anymore, as it adds little value compared to the standard Kafka consumer batching options.

Explicit streams for errors and metrics in Spouts and Bolts configuration

A Major change in the configuration of Spouts and Bolts in channels topologies has taken place : in PunchPlatfom releases up to Brad (included), some hidden/implicit Storm streams of data were automatically generated without user configuration in Storm topologies:

  • streams for propagating punchplatform application metrics "alongside" the user data (following about the same path throughout the topology spouts and bolts)
  • streams for catching, handling and propagating "errors" documents. This was intended to avoid loss of documents that caused a processing error in some Punchlet, ensuring that they would flow through the topologies, and be stored at the end.

This "automagic" streams were great for 'simple' cases, but very wrong in most real-world context where nobody understood what happened in the shadows, and therefore it was difficult to understand failure conditions and impossible to configure what should be done with errors, metrics...

The Craig release therefore expects the Storm topologies configuration to explictly the streams (publish/subscribe) for errors or metrics tuples, in addition to the usual "user data/logs" Storm streams.

To that purpose, the Spouts/Bolts documentation has been of course updated to show new published streams (errors, metrics), that can either be published, and used for later processing/forwarding/storage in the topology, or not published, which will just drop these tuples (as is always the case in a Storm topology).

Danger

Because in Craig, the reserved 'id' field is now '_ppf_id', the error handling mechanism of PunchBolt will only be able to provide a '_ppf_id' field in the errors flow if one is provided in the input stream of the punch bolt.

The _ppf_id field should be provided by the LTR, or added/translated at input by back-office(for example, by publishing _ppf_id at lumberjack spout level).

Main changes for this explicit handling are :

1) Add "_ppf_metrics" stream publishing in all Lumberjack Spout, Syslog Spout, Punch Bolt, Kafka Spout

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
{
    "publish" : [
            {
            "stream": "_ppf_metrics",
            "fields": [
              "_ppf_latency"
            ]
          }
    ]
}

2) Add "_ppf_metrics" stream subscription in all Punch Bolts, Kafka Bolt, Elasticsearch Bolt
3) Add "_ppf_errors" stream publication in all Punch Bolts :

1
2
3
4
    "publish" : [ 
            [...],

            { "stream": "_ppf_errors", "fields": ["_ppf_error_message", "_ppf_error_document", "_ppf_id"] }

4) Remove exception_cat cher_bolt (Or develop a specific one if you have specific need, but make sure it always deliver values in the output fields for error document stream /storage). Was:

1
2
3
4
    "exception_catcher_bolt" : {
      "punchlet" : "punchlets/Common/exception_handler.punch",
      "executors" : 1
    }, 

5) Configure a dedicated Kafka topic for storing the error documents, so that they are not mixed anymore with correctly parsed document. This is because the fields are not the same, and a different Elasticsearch Bolt configuration will be needed at the end, so they should not be mixed in the same Storm Stream.

Elasticsearch indices templates

The following changes have to be made to the Elasticsearch indice templates (see resources/elasticsearch/templates)

  • In all template files, replace

    "template": "<somepattern>" by "index_patterns": [ "<somepattern" ]

This can be achived with the following command (please remember to backup your configuration beforehand) :

1
        sed -i 's/"template"\s*:\s*"\([^"]*\)"/"index_patterns": [ "\1" ]/g' *
  • In all template files, keep a single document type "doc", merging into this document types fields from other previously coexisting types as needed. "default" must be removed (and merged as needed)

  • In overall mapping_events.json, add the following properties :

1
2
3
  "_ppf_error_document" : { "index": false, "type": "keyword" },
  "_ppf_error_message" : { "index": true, "type": "keyword" },
  "_ppf_error_timestamp" : "format": "dateOptionalTime", "type": "date" },
  • Replace existing index templates by the migrated ones :
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
  # Backup your existing templates directory

  # Remove old templates from Elasticsearch
  curl -XDELETE localhost:9200/_template/mapping*

  # Insert new templates into Elasticsearch
  cd $PUNCHPLATFORM_CONF_DIR/resources/elasticsearch/templates
  punchplatform-push-es-templates.sh 

  # Check the imported templates
  curl localhost:9200/_template | jq '. | keys'

  # Test actual indexation of new data to ensure your insertions are compatible with the migrated template