org.thales.punch.apps.shiva.kafka (Punch Shiva Distributed Scheduler 6.4.5 API)

Class Summary
Class	Description
KafkaLeaderController	This class takes care of the kafka gossip exchange required by the leader.
KafkaLeaderControlPlaneController	This class runs the dedicated thread in charge of regularly checkking all alive workers.
KafkaTopicController	This class has only one purpose : share the same kafka producer between the worker and the leader.
KafkaWorkerController
KafkaWorkerControlPlaneController

Package org.thales.punch.apps.shiva.kafka Description

Shiva Worker logic

An example makes it a lot easier to understand the shiva scheduler logic. Say you have a channel called "mychannel", which a single shiva application in there called "myapp".

The channel structure looks like this:


 {
 "version" : "6.0",
 "start_by_tenant" : false,
 "stop_by_tenant" : true,
 "applications" : [
 {
 "type" : "shiva",
 "name" : "myapp",
 "command" : "command.sh",
 "args": [
 "file.json"
 ],
 "cluster" : "local",
 "shiva_runner_tags" : ["local"]
 }
 ]

Assume you have a "tenants/mytenant/channels/shiva-basic/test" application. That string is the name of the application. It is unique

Punch Kafka Protocol Design

Here is the logic of our protocol between shiva daemons (leader and workers), and between the various clients and the shiva daemons. :

Main protocol logic
- A single mono partition topic is used by the leader and worker to communicate.
- Start commands are published onto that topic. A start command includes both th start command itself together with the application unique name, and the complete per tenant archive of the configuration files.
- All workers receive these start command but upon receiving it, only unzip the corresponding application files to their data folder. However they do not start anything yet.
- The leader only consider applications and workers, not the content of the application files. It decides which worker executes which application.
- Once decided the leader publishes a single assignement json document that contains the complete list of all assignements.
- Upon receiving that assignement, workers checks if they must execute an application. If so they start it. They necesseraly have received the corresponding files prior to the assignement.
- They also check if they should stop an application, if so they get rid of the corresponding application files.
- At startup no worker is known to the leader. This said the latest assignment document is considered by the leader as the latest valid state.
  - Hence a new leader election does not entail a storm of application stop restart onto the cluster.
- At startup workers do nothing. They wait for receiving an updated assignment. They always reply to that assignment with their identity.
  - if a worker does not reply, the leader considers it death and reassign its applications.
  - A worker does nothing if a leader is not there for a while, he simply continues processing its assigned apps
- Workers consumes assignment using a latest offset strategy.
  - The assignment history has no interest to them.
Start and stop commands handling
- Start and stop command are sent by the client apps.
  - It simply consists in publishing a command to the control topic.
  - Workers do not process these commands, only the leader.
  - Workers passively wait for an update to the assignment document from the leader.
  - Only one leader can be in charge of these commands. This is guaranteed by the Kafka single partition/topic assignment to the single group of consumers used by leaders.
- At startup if the leader is late it will replay all the start/stop commands submitted while no leader was there.
  - This replay does not pose problems because of the way the assignment is periodically refreshed by workers. I.e. they will not be a storm of start and stop commands executed on the cluster.
Synchronizing application files to all worker servers
- The principle is that worker are installed with and refresehd with the latest complete configuration configuration folder
  - Worker are strictly equivalent to a regular command line environment.
  - The same PUNCHPLATFORM_CONF_DIR points to where the complete resource and tenants folders are located
  - Workers passively wait for an update to the assignment document from the leader.
  - Workers receives a zip archive of that complete conf dir every time an app is started.
  - Workers use the last committed strategy. This works smoothly with the compaction topic.