Skip to content

Platform Metrics

Description

At the platform-level, the PunchPlatform itself publish useful metrics on its own services health. The platform monitoring is automatically started by the Shiva leader node which ensure the monitoring resilience. Ensure sure to have it Shiva properly installed to enable platform monitoring metrics.

By default, the metrics are forwarded to Elasticsearch and written to these indices:

  • platform-health-[YYYY.MM.DD]
  • platform-monitoring-[YYYY.MM.DD]
  • platform-monitoring-current

For now, the metrics are periodically fetched at a fixed time interval of 10 seconds.

Platform Monitoring index

The platform-monitoring-* index contains the metric events coming from the all the PunchPlatform services defined in the properties file. These metrics are services health status with additionnal useful information.

Two Elasticsearch indices are created:

  • platform-monitoring-[YYYY.MM.DD]: to store all event ordered by timestamp
  • platform-monitoring-current: to only keep the latest events (by service)

The first keyword in the name field is used as a prefix for the technology dedicated fields (e.g. elasticsearch, kafka, zookeeper, ...).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
{
  "@timestamp": "2019-06-25T06:58:35.815Z",
  "health_code": 1,
  "health_name": "green",
  "platform.id": "punchplatform-primary",
  "name": "elasticsearch.cluster",
  "type": "platform",
  "elasticsearch": {
    ...
  }
}

Here are the fields shared by all metric events:

  • @timestamp (date)

    Timestamp of the event generation.

  • health_code (integer)

    Define the service health status based on a digit.

    values: 0 (unknown), 1 (green), 2 (yellow), 3 (red)

  • health_name (string)

    Define the service health status with a human readable name.

    values: unknown, green, yellow, red

  • platform.id (string)

    The platform unique identifier defined in the platform properties.

  • name (string)

    The metrics name identifier

  • type (string)

    From where the metric is coming, always set to "platform" in this case.

Platform Health index

The platform-health-* index stores an aggregate overview of the platform health. There is only one kind of document inserted, an example can be found below. The document structure has been designed to be as close as possible to the punchplatform.properties.

Note

This document is especially useful to monitor the platform with an external automated tool (like nagios), refer to the monitoring guide to learn more about it.

A new metric is added every 10 seconds with the current plateform state.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
{
  "@timestamp": "2019-06-10T13:00:04.300Z",
  "storm": {
    "health_code": 1,
    "health_name": "green",
    "clusters": {
      "main": {
        "nimbus": {
          "hosts": {
            "punch-elitebook": {
              "health_code": 1,
              "health_name": "green"
            }
          }
        },
        "health_code": 1,
        "health_name": "green",
        "supervisor": {
          "hosts": {
            "punch-elitebook": {
              "health_code": 1,
              "health_name": "green"
            }
          }
        }
      }
    }
  },
  "elasticsearch": {
    "health_code": 1,
    "health_name": "green",
    "clusters": {
      "es_search": {
        "health_code": 1,
        "health_name": "green"
      }
    }
  },
  "zookeeper": {
    "health_code": 1,
    "health_name": "green",
    "clusters": {
      "common": {
        "health_code": 1,
        "hosts": {
          "localhost": {
            "health_code": 1,
            "health_name": "green"
          },
          "punch-elitebook": {
            "health_code": 1,
            "health_name": "green"
          }
        },
        "health_name": "green"
      }
    }
  },
  "spark": {
    "health_code": 1,
    "health_name": "green",
    "clusters": {
      "spark_main": {
        "health_code": 1,
        "health_name": "green",
        "worker": {
          "hosts": {
            "localhost": {
              "health_code": 1,
              "health_name": "green"
            }
          }
        },
        "master": {
          "hosts": {
            "localhost": {
              "health_code": 1,
              "health_name": "green"
            }
          }
        }
      }
    }
  },
  "kafka": {
    "health_code": 1,
    "health_name": "green",
    "clusters": {
      "local": {
        "brokers": {
          "0": {
            "health_code": 1,
            "health_name": "green"
          }
        },
        "health_code": 1,
        "health_name": "green"
      }
    }
  },
  "shiva": {
    "health_code": 1,
    "health_name": "green",
    "clusters": {
      "common": {
        "health_code": 1,
        "health_name": "green"
      }
    }
  },
  "platform": {
    "health_code": 1,
    "health_name": "green"
  }
}