HOWTO check Shiva kafka topics and assignements¶
Why do that¶
On a production distributed punch, Shiva leverages Kafka topics for cluster membership, control and to provide application start and stop. If you have doubts about shiva application not successfully submitted or running, it is useful to ensure these kafka topics are correctly configured and available.
What to do¶
List the Shiva topics¶
Use the punchplatform-kafka-topics.sh
command to list the topics of your plattform, you should easily locate
the assignement, control and control (ctl) topics used by shiva.
punchplatform-kafka-topics.sh --list
Tip
that command is installed on all punch operator servers.
On a standalone these topics are typically :
platform-shiva-local-assignement
platform-shiva-local-cmd
platform-shiva-local-ctl
Check their status¶
If you have only a few topics on your platform, the simplest is:
punchplatform-kafka-topics.sh --describe
punchplatform-kafka-topics.sh --topic shiva-local-assignement --describe
Here is a sample output:
Topic: platform-shiva-common-ctl PartitionCount: 1 ReplicationFactor: 1 Configs: compression.type=gzip,cleanup.policy=compact,segment.bytes=1073741824,min.cleanable.dirty.ratio=0.5,retention.bytes=104857600,delete.retention.ms=86400000,segment.ms=604800000
Topic: platform-shiva-common-ctl Partition: 0 Leader: 1 Replicas: 1 Isr: 1
Topic: platform-shiva-common-cmd PartitionCount: 1 ReplicationFactor: 1 Configs: compression.type=gzip,cleanup.policy=compact,segment.bytes=1073741824,min.cleanable.dirty.ratio=0.5,retention.bytes=104857600,delete.retention.ms=86400000,segment.ms=604800000
Topic: platform-shiva-common-cmd Partition: 0 Leader: 2 Replicas: 2 Isr: 2
Topic: platform-admin PartitionCount: 1 ReplicationFactor: 1 Configs: compression.type=gzip,cleanup.policy=compact,segment.bytes=1073741824,min.cleanable.dirty.ratio=0.5,retention.bytes=104857600,delete.retention.ms=86400000,segment.ms=604800000
Topic: platform-admin Partition: 0 Leader: 3 Replicas: 3 Isr: 3
Topic: platform-shiva-common-assignement PartitionCount: 1 ReplicationFactor: 1 Configs: compression.type=gzip,cleanup.policy=compact,segment.bytes=1073741824,min.cleanable.dirty.ratio=0.5,retention.bytes=104857600,delete.retention.ms=86400000,segment.ms=604800000
Topic: platform-shiva-common-assignement Partition: 0 Leader: 2 Replicas: 2 Isr: 2
What is essential is to have all these topics with a single partition, and with a kafka leader assigned.
In addition, see this section to check if your shiva topic are well configured
Check the group and consumer identifiers¶
This is optional but can be useful to check misconfiguration issues. Each Kafka consumer is identified by a pair group-id/consumer-id. You can see the group identifiers using:
punchplatform-kafka-consumers.sh --list
punchplatform-kafka-consumers.sh --describe --group local-leader
Check this section to see which results you must have when executing these commands
Check worker health¶
All alive worker nodes are periodically publishing a heartbeat message to the CONTROL TOPIC.
You can see which runner are alive through this command (wait some seconds to see updates from live runners):
/data/opt/kafka_2.12-2.8.1/bin/kafka-console-consumer.sh --bootstrap-server server2:9092 --topic platform-shiva-local-ctl --property print.key=true --property print.timestamp=true --property key.separator="==> " | grep worker
CreateTime:1604403388118==> pong_v1==> {"id":"worker-server3","tags":["local","server3"]}
CreateTime:1604403388119==> pong_v1==> {"id":"worker-server4","tags":["local","server4"]}
You can convert a timestamp easily using 'date' (remove the milliseconds of timestamp):
date --date=@1598460813
Wed Aug 26 18:53:33 CEST 2020
Check leader health¶
The leader node is the one that is consuming the control partition.
The leader node will use a kafka consumer group named '
You can find the current leader by running the following command :
/data/opt/kafka_2.12-2.8.1/bin/kafka-console-consumer.sh --bootstrap-server server2:9092 --topic shiva-local-assignement | head -n 1 | jq .leader_id
"leader-server4"
Then, you can the current leader health by using the kafka-consumer-groups.sh tool '--describe' command. A shortcut is to use its punchplatform wrapper:
punchplatform-kafka-consumers.sh --kafkaCluster local --describe --group leader-server4
bootstrap servers : 'server2:9092'
kafka consumers for kafka cluster 'local'...
GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
leader-server4 shiva-local-ctl 0 1380 1380 0 consumer-leader-server4-2-056c32f0-81b1-439a-93b8-a9728c2d7757 /172.28.128.24 consumer-leader-server4-2
Check assignements¶
The leader node is updating periodically (a few seconds) the assignments of tasks to applicable live runners (taking into account the required 'tags' of each task) to choose runners having said required tags.
You can see current assignements through this command (wait some seconds to see updates from live runners):
/data/opt/kafka_2.12-2.8.1/bin/kafka-console-consumer.sh --bootstrap-server server2:9092 --topic shiva-processing_shiva-assignement | head -n 1 | jq .
{
"cluster_name": "processing_shiva",
"election_timestamp": "2020-08-26T16:31:57.062Z",
"assignements": {
"pbrpmishiv02": [
"tenants/platform/channels/platform_monitoring/platform_health"
],
"pbrpmishiv01": [
"tenants/platform/channels/platform_monitoring/local_events_dispatcher"
]
},
"leader_id": "pbrpmishiv01",
"state": {
"workers": {
"pbrpmishiv02": {
"id": "pbrpmishiv02",
"tags": [
"pbrpmishiv02"
]
},
"pbrpmishiv01": {
"id": "pbrpmishiv01",
"tags": [
"pbrpmishiv01"
]
}
},
"applications": {
"tenants/platform/channels/platform_monitoring/platform_health": {
"name": "tenants/platform/channels/platform_monitoring/platform_health",
"tags": []
},
"tenants/platform/channels/platform_monitoring/local_events_dispatcher": {
"name": "tenants/platform/channels/platform_monitoring/local_events_dispatcher",
"tags": []
}
}
},
"version": "5.0",
"unassigned_tasks": [],
"applications": {
"tenants/platform/channels/platform_monitoring/platform_health": {
"args": [
"platform-monitoring",
"platform_health.json"
],
"cluster_name": "processing_shiva",
"name": "tenants/platform/channels/platform_monitoring/platform_health",
"execution_schedule": "",
"tags": []
},
"tenants/platform/channels/platform_monitoring/local_events_dispatcher": {
"args": [
"punchline",
"--mode",
"light",
"--punchline",
"local_events_dispatcher.hjson"
],
"cluster_name": "processing_shiva",
"name": "tenants/platform/channels/platform_monitoring/local_events_dispatcher",
"execution_schedule": "",
"tags": []
}
}
}
You can have a more compact display through a jq filter :
/data/opt/kafka_2.12-2.8.1/bin/kafka-console-consumer.sh --bootstrap-server pbrpmizkkaf01:9093 --topic shiva-processing_shiva-assignement | head -n 1 |jq -r '(.assignements | to_entries[] | .key as $HOST | .value[] | ( . + " ==> " + $HOST) )'
tenants/platform/channels/platform_monitoring/platform_health ==> pbrpmishiv02
tenants/platform/channels/platform_monitoring/local_events_dispatcher ==> pbrpmishiv01