Package org.thales.punch.apps.shiva.kafka Description
Shiva Worker logic
An example makes it a lot easier to understand the shiva scheduler logic. Say you have
a channel called "mychannel", which a single shiva application in there called "myapp".
The channel structure looks like this:
{
"version" : "6.0",
"start_by_tenant" : false,
"stop_by_tenant" : true,
"applications" : [
{
"type" : "shiva",
"name" : "myapp",
"command" : "command.sh",
"args": [
"file.json"
],
"cluster" : "local",
"shiva_runner_tags" : ["local"]
}
]
Assume you have a "tenants/mytenant/channels/shiva-basic/test" application. That string is the name of
the application. It is unique
Punch Kafka Protocol Design
Here is the logic of our protocol between shiva daemons (leader and workers),
and between the various clients and the shiva daemons. :
- Main protocol logic
- A single mono partition topic is used by the leader and worker to communicate.
- Start commands are published onto that topic. A start command includes both th start command itself together with the application unique name, and
the complete per tenant archive of the configuration files.
- All workers receive these start command but upon receiving it, only unzip the corresponding
application files to their data folder. However they do not start anything yet.
- The leader only consider applications and workers, not the content of the application files. It decides which worker executes which application.
- Once decided the leader publishes a single assignement json document that contains the complete list of all assignements.
- Upon receiving that assignement, workers checks if they must execute an application. If so they start it. They necesseraly have received the corresponding files
prior to the assignement.
- They also check if they should stop an application, if so they get rid of the corresponding application files.
- At startup no worker is known to the leader. This said the latest assignment document is considered by the leader as the latest valid state.
- Hence a new leader election does not entail a storm of application stop restart onto the cluster.
- At startup workers do nothing. They wait for receiving an updated assignment. They always reply to that assignment with their identity.
- if a worker does not reply, the leader considers it death and reassign its applications.
- A worker does nothing if a leader is not there for a while, he simply continues processing its assigned apps
- Workers consumes assignment using a latest offset strategy.
- The assignment history has no interest to them.
- Start and stop commands handling
- Start and stop command are sent by the client apps.
- It simply consists in publishing a command to the control topic.
- Workers do not process these commands, only the leader.
- Workers passively wait for an update to the assignment document from the leader.
- Only one leader can be in charge of these commands. This is guaranteed by the Kafka single partition/topic assignment to the single group of consumers used by leaders.
- At startup if the leader is late it will replay all the start/stop commands submitted while no leader was there.
- This replay does not pose problems because of the way the assignment is periodically refreshed by workers. I.e. they will not be a storm of start and stop commands executed
on the cluster.
- Synchronizing application files to all worker servers
- The principle is that worker are installed with and refresehd with the latest complete configuration configuration folder
- Worker are strictly equivalent to a regular command line environment.
- The same PUNCHPLATFORM_CONF_DIR points to where the complete resource and tenants folders are located
- Workers passively wait for an update to the assignment document from the leader.
- Workers receives a zip archive of that complete conf dir every time an app is started.
- Workers use the last committed strategy. This works smoothly with the compaction topic.