Skip to content

Metrics

Metrics Handling

The PunchPlatform Storm components (bolts and spouts), as well as system-level metrics collector publish various metrics in real time. These metrics are of several types:

  • Histograms : measures the statistical distribution of values in a stream of data. In addition to minimum, maximum, mean, etc., it also measures median, 75th, 90th, 95th, 98th, 99th, and 99.9th percentiles.
  • Meters: measures the rate of events over time (e.g., "requests per second"). In addition to the mean rate, meters also track 1-, 5-, and 15-minute moving averages.
  • Counters: A counter is just a gauge for an AtomicLong instance.

These metrics are published only periodically to a metrics storage backend, either a loggers or an elasticsearch cluster. You can activate them with no impact on the platform performance, they are designed to cope with high traffic processing. All statistical calculations are performed in real time in storm workers, only the consolidated result is sent periodically to the back-end.

Metrics are not stored/indexed in Elasticsearch the way logs are indexed.

Features

  • Periodic (\~10s, customizable) grouped sending of metrics values to the metrics backend(s).
  • Supported metrics backends are :

    • \"metric name/metric value\" type backends :

      • logback appender
    • \"tagged values\" type backends (i.e. additional information can accompany a metric value, instead of being inside the metric name):

      • elasticsearch
    • Sending is non-blocking ; in case on unreachable back end, metrics values will be lost and functional behaviour of the software will continue ; when back end is available again, sending of new metrics values will succeed.

    • Functional parts of the software can provide instantaneous measures, or counter metrics increase.
    • Metrics layer takes care of associated computed statistics (counter aggregation, long term average, 1 minute mean, ) and of providing a single value by sending period (\"bucketting\")
    • \"uptime\" metrics are generic automatically-increased time counter metrics that indicate the process is still alive
    • Metrics can be sent either directly by a process to the backend(s) AND can also be forwarded through the PunchPlatform channel, and sent by another process of the channel (acting as a \"metrics proxy\").
    • Direct sending to the various backends, forwarding are settings that can be overriden at a topology setting level
    • Metrics proxy activation can be activated at a spout setting level

Common variations about PunchPlatform Metrics

For all metrics indicating event rates and latency measures, the metric is automatically declined in multiple data subseries :

  • m1_rate : the Exponentially Weighted Moving Average (EWMA) on a 1 minute period (period of time in minutes over which the reading is said to be averaged (the mean lifetime of each reading in the average), 60% of the value in computation).
  • mean_rate : a moving long-term average with an exponential temporal attenuation
  • count : the cumulated values that have been registered (this value is often meaningless, except for message counts, where it is actually representative of the number of messages cumulated over time)

For latency and size measures, the additional data series are provided, derived from the base measure :

  • stddev, the standard deviation
  • max : maximal value
  • min : minimal value

When using elasticsearch backend, these values are grouped as a single metric record, as subfields in the metric object. Therefore the series value field is the name of the metric, then a \'.\', then the subseries identifier :

To know the the available subseries, refer to the following table, and to the metrics types as listed in [UserGuideSpouts]{.title-ref} and ::doc::[UserGuideBolts]{.title-ref} documents.


TYPE SUBSERIES


Counter count

Gauge gauge

Histogram count, max, min, mean, stddev

Meter count, m1_rate

Timer count, m1_rate, max, min, mean, stddev

TimeValue \"max\", \"ts_diff\", ... Depends on the metric


Metrics system configuration

The PunchPlatform spouts/bolts metrics are published by configuration to send data towards several reporters. This metrics configuration can be defined in any topology file. You can add as many reporter as needed. For example, a metric section in a topology file can looks like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
{
  "tenant": "mytenant",
  "channel": "sourcefire",
  "name": "single",
  "spouts": [
    ...
  ],
  "bolts": [
    ...
  ],
  "metrics": {
    "reporters": [
      {
        "type": "elasticsearch",
        ...
      },
      {
        "type": "logger",
        ...
      }
    ]
  },
  "storm_settings": {
    ...
  }
}

Common reporter configuration


reporter parameter type mandatory default description value


type String yes Reporter type : \"elasticsearch\", \"kafka\", \"lumberjack\", \"logger\", tcp\", \"udp\"

reporting_interval Integer no 10 period of metrics publication in seconds


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
"metrics": {
    "reporters": [
        {
            "type": "elasticsearch",
            "reporting_interval": "10"
            ...
        },
        {
            "type": "kafka",
            ...
        }
    ]
}

Elasticsearch Metrics Reporting

In order to make topologies send their metrics to Elasticsearch, add a elasticsearch section to the topology file, as illustrated next:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
"metrics": {
  "reporters": [
    {
      "type": "elasticsearch",
      "cluster_name": "es_search"
    },
    {
      "type": "elasticsearch",
      "reporting_interval": 10,
      "index_name": "metrics",
      "index_type": "doc",
      "http_hosts": [
        {
          "host": "localhost",
          "port": 9200
        }
      ]
    }
  ]
}

reporter parameter type mandatory default value description


cluster_name String no (if) Elasticsearch target cluster name. Mandatory if \"http_hosts\" is not set.

index_name String no \"[tenant]-metrics-\" index name prefix (see below for suffix)

index_type String no \"doc\" the index document type

index_suffix_date_pattern String no \"yyyy.MM.dd\" index name suffix (java date simpleformat standard)

http_hosts Array no (if) [{ \"host\" : map of host and port to \"[your_host]\", reach Elasticsearch cluster. \"port\" : 9200 }] Mandatory if \"cluster_name\" is not set.


Log Metrics Reporting

To make topologies log their metrics in a standard software logger (for exemple for appending to a file), add a logger section to the topology file, as illustrated next:

1
2
3
4
5
6
7
8
9
"metrics" : {
    "reporters" : [
        {
            "type" : "logger",
            "reporting_interval" : 5,
            "format" : "kv"
        }
    ]
}

reporter type mandatory default description parameter value


format \"kv\" or no \"json\" \"kv\" is more log-parsing \"json\" friendly, but \"json\" shows full metric document as would be recorded inside Elasticsearch


You can then configure the storm logger to either log the metrics to a file, or to a centralized logging service if one is available in your platform. In the \$STORM_INSTALL_DIR/log4j/worker.xml of each worker machine, include the following logger:

1
2
3
4
5
6
7
<!--
    Metrics sent to elasticserach or to the logger reporter are prefixed with
    "punchplatform". Or something else if you change the platform_id in
    your punchplatform.properties.
-->

<logger name="com.thales.services.cloudomc.punchplatform.storm.core.metrics.LogReporter" level="INFO"/>

Note

on a standalone installation, this file is located in the \$PUNCHPLATFORM_CONF_DIR/externl/apache-x.y.z/log4j folder. Note also that whenever you start a topology to run using the local mode, the logger configuration file used is different. It is \$PUNCHPLATFORM_INSTALL_DIR/bin/logback-topology.xml

Kafka Metrics Reporting

In order to make topologies send their metrics to a Kafka topic, add a kafka reporter to the topology file, as illustrated next:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
"metrics" : {
    "reporters" : [
        {
            "type" : "kafka",
            "bootstrap.servers" : "host1:9092,host2:9092",
            "topic": "<topic_name>"
            "reporting_interval" : 30
        }
    ]
},

reporter parameter type mandatory default value description


topic String yes kafka topic into which the metrics tuples will be written

bootstrap.servers String yes comma separated kafka bootstrap servers

[legacy] brokers String no kafka brokers cluster name, matching a kafka cluster declared in Punchplatform.properties

metric_document_field_name String no \"log\" Storm field name into which the metric document (json) is sent. This should match the name of the published field in the topology simpleformat standard)

metric_type_field_name String yes \"metric_type\" name of the storm field into which the name(type) of the metric will be put. This enables to store the metric using an Elasticsearch Bolt further in the metrics topologies, to benefit from different metrics mappings and retention strategy. To allow for use as an Elasticsearch document type, the metric name is transformed automatically in a type by replacing the \".\" by \"_\". Therefore a metric named \"storm.tuple.ack\" will have a type equals to \"storm_tuple_ack\". cluster transport port (not HTTP)


Lumberjack Metrics Reporting

In order to make topologies forward their metrics through Lumberjack, add a Lumberjack reporter to the topology file, as illustrated next:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
"metrics" : {
    "reporters" : [
        {
            "type" : "lumberjack",
            "metric_document_field_name" : "log",
            "metric_type_field_name" : "metric_type",
            "reporting_interval" : 30,
            "destination" : [
                {
                    "compression" : true,
                    "host" : "target.ip.address",
                    "port" : 9999
                }
            ]
        }
    ]
}

reporter parameter type mandatory default description value


metric_document_field_name String no \"log\" Storm field name into which the metric document (json) is sent. This should match the name of the published field in the topology simpleformat standard)

metric_type_field_name String yes \"metric_ name of the storm field into which the name(type) of type\" the metric will be put. This enables to store the metric using an Elasticsearch Bolt further in the metrics topologies, to benefit from different metrics mappings and retention strategy. To allow for use as an Elasticsearch document type, the metric name is transformed automatically in a type by replacing the \".\" by \"_\". Therefore a metric named \"storm.tuple.ack\" will have a type equals to \"storm_tuple_ack\". cluster transport port (not HTTP)

startup_connection_timeout integer no 1000ms time waited at initial startup phase, before decision (ms) is made to target the group that has reached the maximum weight (no effect when only one group is targetted)

group_connection_timeout integer no 1000ms time waited at initial startup phase before it is (ms) considered an error to have no available target group

destination structu yes na set of parameters identifying one (or multiple) re destination. See the syslog_bolt configuration for details(but for Lumberjack, only tcp protocol is available)


Socket Metrics Reporting

In order to make topologies forward their metrics through syslog TCP, add a socket reporter to the topology file, as illustrated next:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
"metrics" : {
    "reporters" : [
        "socket_metric_reporter" : {
            "type" : "tcp",
            "reporting_interval" : 30,
            "destination" : [
                {
                    "compression" : true,
                    "host" : "[your_dest_address]",
                    "port" : 9999
                }
            ]
        }
    ]
}

reporter parameter type mandatory default description value


startup_connection_timeout integer no 1000ms time waited at initial startup phase, before decision (ms) is made to target the group that has reached the maximum weight (no effect when only one group is targetted)

group_connection_timeout integer no 1000ms time waited at initial startup phase before it is (ms) considered an error to have no available target group

destination structure yes na set of parameters identifying one (or multiple) destination. See the syslog_bolt configuration for details(but for Lumberjack, only tcp protocol is available)


Metrics in Apache Spark

Activate Spark metrics

For now, the only way to activate Spark metrics is by settings a tenant at launch time. For example, when you start a job or a plan, do not forget the --tenant option. Here is an example:

1
$ punchplatform-analytics.sh --job [your_example].json --spark-master spark_main --deploy-mode cluster --tenant mytenant

With this way, various Spark metrics will be reported to Elasticsearch using this index pattern [tenant]-metrics-YYYY.MM.DD. In our above example, it would have been mytenant-metrics-2018.04.25.

Metrics format and content

Here is an example of an Apache Spark metric.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
{
    "@timestamp": "2018-04-25T15:09:38.918Z",
    "metricset": {
        "name": "job",
        "runtime_id": "0ef774b0-1ea5-4008-ab75-836c27ec79b8",
        "type": "spark"
    },
    "job": {
        "size": 1,
        "stages": 1,
        "id": 0,
        "event": "start"
    }
}

As we can see, this event is a job starting. See the next section for all existing events.

About Spark metric parameters, the ones existing in every metrics are listed in this data table:


parameter type description


\@timestamp date The datetime of the event

metricset.name string The event name

metricset.type string The technology name. Here always \"spark\"

metricset.runtime_id string An uuid given by PunchPlatform to identify all event from the same application


Events reported

Any reported event is generated by Spark itself. In fact, we internally rely on a SparkListener java class that implement all method listed in the official documentation.

To get more information, see the Apache Spark documentation.