Skip to content

HOWTO check Shiva kafka topics and assignements

Why do that

On a production distributed punch, Shiva leverages Kafka topics for cluster membership, control and to provide application start and stop. If you have doubts about shiva application not successfully submitted or running, it is useful to ensure these kafka topics are correctly configured and available.

What to do

List the Shiva topics

Use the punchplatform-kafka-topics.sh command to list the topics of your plattform, you should easily locate the assignement, control and control (ctl) topics used by shiva.

punchplatform-kafka-topics.sh --list

Tip

that command is installed on all punch operator servers.

On a standalone these topics are typically :

platform-shiva-local-assignement
platform-shiva-local-cmd
platform-shiva-local-ctl

Check their status

If you have only a few topics on your platform, the simplest is:

punchplatform-kafka-topics.sh  --describe
Or you can selectively inspect each topic as follows:
punchplatform-kafka-topics.sh --topic shiva-local-assignement --describe

Here is a sample output:

Topic: platform-shiva-common-ctl    PartitionCount: 1   ReplicationFactor: 1    Configs: compression.type=gzip,cleanup.policy=compact,segment.bytes=1073741824,min.cleanable.dirty.ratio=0.5,retention.bytes=104857600,delete.retention.ms=86400000,segment.ms=604800000
    Topic: platform-shiva-common-ctl    Partition: 0    Leader: 1   Replicas: 1 Isr: 1
Topic: platform-shiva-common-cmd    PartitionCount: 1   ReplicationFactor: 1    Configs: compression.type=gzip,cleanup.policy=compact,segment.bytes=1073741824,min.cleanable.dirty.ratio=0.5,retention.bytes=104857600,delete.retention.ms=86400000,segment.ms=604800000
    Topic: platform-shiva-common-cmd    Partition: 0    Leader: 2   Replicas: 2 Isr: 2
Topic: platform-admin   PartitionCount: 1   ReplicationFactor: 1    Configs: compression.type=gzip,cleanup.policy=compact,segment.bytes=1073741824,min.cleanable.dirty.ratio=0.5,retention.bytes=104857600,delete.retention.ms=86400000,segment.ms=604800000
    Topic: platform-admin   Partition: 0    Leader: 3   Replicas: 3 Isr: 3
Topic: platform-shiva-common-assignement    PartitionCount: 1   ReplicationFactor: 1    Configs: compression.type=gzip,cleanup.policy=compact,segment.bytes=1073741824,min.cleanable.dirty.ratio=0.5,retention.bytes=104857600,delete.retention.ms=86400000,segment.ms=604800000
    Topic: platform-shiva-common-assignement    Partition: 0    Leader: 2   Replicas: 2 Isr: 2

What is essential is to have all these topics with a single partition, and with a kafka leader assigned.

In addition, see this section to check if your shiva topic are well configured

Check the group and consumer identifiers

This is optional but can be useful to check misconfiguration issues. Each Kafka consumer is identified by a pair group-id/consumer-id. You can see the group identifiers using:

punchplatform-kafka-consumers.sh --list

punchplatform-kafka-consumers.sh --describe --group local-leader

Check this section to see which results you must have when executing these commands

Check worker health

All alive worker nodes are periodically publishing a heartbeat message to the CONTROL TOPIC.

You can see which runner are alive through this command (wait some seconds to see updates from live runners):

/data/opt/kafka_2.12-2.8.1/bin/kafka-console-consumer.sh --bootstrap-server server2:9092 --topic platform-shiva-local-ctl  --property print.key=true --property print.timestamp=true --property key.separator="==> " | grep   worker

CreateTime:1604403388118==> pong_v1==> {"id":"worker-server3","tags":["local","server3"]}
CreateTime:1604403388119==> pong_v1==> {"id":"worker-server4","tags":["local","server4"]}

You can convert a timestamp easily using 'date' (remove the milliseconds of timestamp):

date --date=@1598460813

Wed Aug 26 18:53:33 CEST 2020

Check leader health

The leader node is the one that is consuming the control partition. The leader node will use a kafka consumer group named '-leader'.

You can find the current leader by running the following command :

/data/opt/kafka_2.12-2.8.1/bin/kafka-console-consumer.sh --bootstrap-server server2:9092 --topic shiva-local-assignement | head -n 1 | jq .leader_id
"leader-server4"

Then, you can the current leader health by using the kafka-consumer-groups.sh tool '--describe' command. A shortcut is to use its punchplatform wrapper:

punchplatform-kafka-consumers.sh --kafkaCluster local --describe --group leader-server4

bootstrap servers : 'server2:9092'

kafka consumers for kafka cluster 'local'...

GROUP           TOPIC           PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID                                                    HOST            CLIENT-ID
leader-server4  shiva-local-ctl 0          1380            1380            0               consumer-leader-server4-2-056c32f0-81b1-439a-93b8-a9728c2d7757 /172.28.128.24  consumer-leader-server4-2

Check assignements

The leader node is updating periodically (a few seconds) the assignments of tasks to applicable live runners (taking into account the required 'tags' of each task) to choose runners having said required tags.

You can see current assignements through this command (wait some seconds to see updates from live runners):

/data/opt/kafka_2.12-2.8.1/bin/kafka-console-consumer.sh --bootstrap-server server2:9092 --topic shiva-processing_shiva-assignement | head -n 1 | jq .


{
  "cluster_name": "processing_shiva",
  "election_timestamp": "2020-08-26T16:31:57.062Z",
  "assignements": {
    "pbrpmishiv02": [
      "tenants/platform/channels/platform_monitoring/platform_health"
    ],
    "pbrpmishiv01": [
      "tenants/platform/channels/platform_monitoring/local_events_dispatcher"
    ]
  },
  "leader_id": "pbrpmishiv01",
  "state": {
    "workers": {
      "pbrpmishiv02": {
        "id": "pbrpmishiv02",
        "tags": [
          "pbrpmishiv02"
        ]
      },
      "pbrpmishiv01": {
        "id": "pbrpmishiv01",
        "tags": [
          "pbrpmishiv01"
        ]
      }
    },
    "applications": {
      "tenants/platform/channels/platform_monitoring/platform_health": {
        "name": "tenants/platform/channels/platform_monitoring/platform_health",
        "tags": []
      },
      "tenants/platform/channels/platform_monitoring/local_events_dispatcher": {
        "name": "tenants/platform/channels/platform_monitoring/local_events_dispatcher",
        "tags": []
      }
    }
  },
  "version": "5.0",
  "unassigned_tasks": [],
  "applications": {
    "tenants/platform/channels/platform_monitoring/platform_health": {
      "args": [
        "platform-monitoring",
        "platform_health.json"
      ],
      "cluster_name": "processing_shiva",
      "name": "tenants/platform/channels/platform_monitoring/platform_health",
      "execution_schedule": "",
      "tags": []
    },
    "tenants/platform/channels/platform_monitoring/local_events_dispatcher": {
      "args": [
        "punchline",
        "--mode",
        "light",
        "--punchline",
        "local_events_dispatcher.hjson"
      ],
      "cluster_name": "processing_shiva",
      "name": "tenants/platform/channels/platform_monitoring/local_events_dispatcher",
      "execution_schedule": "",
      "tags": []
    }
  }
}

You can have a more compact display through a jq filter :

 /data/opt/kafka_2.12-2.8.1/bin/kafka-console-consumer.sh --bootstrap-server pbrpmizkkaf01:9093 --topic shiva-processing_shiva-assignement | head -n 1  |jq -r '(.assignements | to_entries[] | .key as $HOST |  .value[] | ( . + " ==> " + $HOST) )'  

tenants/platform/channels/platform_monitoring/platform_health ==> pbrpmishiv02
tenants/platform/channels/platform_monitoring/local_events_dispatcher ==> pbrpmishiv01