Patch Procedure¶
Abstract
Patching refers to patch of a specific component or set of component after retrieving a patch file from the punch online site. This procedure is not a platform update or upgrade.
Patchable Components¶
The following components support patching:
- Shiva : the shiva scheduler service can be patched. Refer to this procedure.
- Shiva applications : shiva is actually used to start a number of punch applications (punchlines, logstash, monitoring apps etc..). These can be patched as well. Refer to this documentation.
- Punch Gateway : the punch API and REST gateway together with its underlying actions can be patched. Refer to this procedure.
- Operator environment : the operator unix environment and the channelctl, punchlinectl etc.. commands can be patched. Refer to this procedure documentation
- Punch internal binaries : some punch binaries coming along spark, storm and pyspark official binaries can be patched. Refer to Storm procedure and Spark/Pyspark procedure documentation to have more information. Note that in the special case of the punch built-in Storm components, those are updated by patching the operator nodes and restarting the related channels, not by patching the Storm cluster itself.
Non Patchable Components¶
The Storm, Spark, Elasticsearch, Zookepeer, Minio, Clickhouse clusters and the Kibana application cannot be patched. This does not mean you cannot conduct an update or upgrade procedure with a rollback strategy, it simply means the punch does not provide a custom lightweight patch strategy.
To address the update of these components, get in touch with the punch professional service team.
Patch Procedure¶
The punch makes it easy and safe to patch a production platform. But even so, patching a production platform is a critical action. Make sure you have a clear idea of:
- who is responsible ? : patching a component always have consequences, sometimes unexpected. Make sure the impacts are identified, reviewed and accepted by the platform owner.
- what is the downtime ? : most often there is no or little downtime. This said it all depends on the component you patch. Make sure you know what will be impacted and for how long. When patching multiple nodes of a highly-available cluster addressed by a single Virtual IP (shiva nodes holding network listeners, gateway servers) you may want to limit update command to only one node at a time (using -l
), and manage manually the Virtual IP location (through your platform system-level Virtual IP management commands) consistently so that the IP is always served by a node which you are not currently updating. - will it generate monitoring alerts ? patching a component requires some sort of restart. Make sure the monitoring plane will not generate false alarms
- are you ready to rollback ? : the punch patching procedure is designed to be simple and easy to rollback. Make sure you know how to do it before patching.
If these are cleared up, proceed to the following procedure. Make sure you complete all the steps in order.
1. Platform Check¶
First, check the version of your platform:
# Go to an operator device and type
punchplatform-version.sh
# You must have this kind of output:
PunchPlatform package:
X.Y.Z
Next, check if a patch is already installed on your platform. It is a key point to check. Go to an operator, shiva and gateway servers and check if a
/data/opt/punchplatform-binaries-X.Y.Z/patch
folder exists. If you have some jars or libraries in there, first check that the latest stable punch release includes the issue that necessitated the deployment of the patch(es). Refer to the online documentation and/or email the support team at support@punchplatform.com. It might be necessary to first update your platform to the latest minor release that corresponds to your release to get rid of unnecessary patches.
Assuming you have no patch, or you have updated your platform as just explained, check the platform status and make sure :
- All the services are green in punch monitoring dashboard
- your supervisor (nagios or any other monitoring components) reports a green status especially with respect to the System, storage, memory, CPU monitoring.
Do not proceed is that status is unclear. Check your local support or in turn the punch support team.
2. Patch Preparation¶
Download the patch, most often a zip file, or a jar file from the online punch download area.
Next update the deployer folder. That folder is project specific and located on the deployer server (i.e. the server from where you deployed your platform).
Create if not already there the following folder :
cd <deployer_path>
mkdir archives/patch/
Copy in that folder the downloaded patch. Create a git tag before patching
Important
It can be a simple jar or a zip with a subdirectory (i.e. storm, spark ..) containing some jars or pex files In this case you must unzip this patch and keep tree structure in your patch directory (archives/patch/spark/mypatch.jar)
Next, still on the same deployer server, go to $PUNCHPLATFORM_CONF_DIR and tag the version
cd $PUNCHPLATFORM_CONF_DIR
git pull
git tag -a v<CURRENT_VERSION> -m "before migrate to <NEW_VERSION>"
git push --tags
Generate the deployment configuration with the following command. these files has no impact of the running platform, it is only a local operation.
punchplatform-deployer.sh --generate-inventory
That command regenerates the various inventories and variables used by ansible to deploy the punch components. This command has no effect on the running platform. If errors are reported, please check the punchplatform-deployment.settings configuration, and contact the punch support.
Push the new configuration. This operation has no no impact on the running platform.
git push
Important
Before going to the next step, make sure you communicated your procedure to the platform stakeholders.
3. Patch Deployment¶
Deploy the patch to all punch servers.
punchplatform-deployer.sh --deploy -Kk -t patch
Depending on what you patch (operator, shiva, gateway, storm, spark ...), you can select hosts by executing :
punchplatform-deployer.sh --deploy -Kk -t patch -l punchplatform_operator_servers
punchplatform-deployer.sh --deploy -Kk -t patch -l shiva_servers
punchplatform-deployer.sh --deploy -Kk -t patch -l gateway_servers
punchplatform-deployer.sh --deploy -Kk -t patch -l storm_servers
punchplatform-deployer.sh --deploy -Kk -t patch -l spark_servers
Warning
You can only have one patch per application/component on servers (deployer or targets). Therefore, during deployment, if multiple patches for the same application are detected on the deployment server, a failure message will appear. Also on the target servers it will be a prompt, if you continue the source of truth will be the deployer, your old deployed patch for the current application will be removed. To deploy without any user validation (ansible prompt) you can pass this ansible extra-var in order to force automatic update of patches : -e force_patch=true
Next test the new patch by restarting all or some channels. It is of course much safer to restart only a single channel and check it runs as expected:
channelctl stop --channel <channel>
channelctl start --channel <channel>
Make sure :
- the channel stop and start was ok
- that the behaviour of the channel is correct : eps rate, failure rate, content of the data in Elasticsearch, in the SIEM etc ...
- check the monitoring status.
Tip
If errors appears, no panic! Rollback your environment with the old deployer (rollback deployment environment variable and redeploy operators )
4. Final Checks¶
Tag the end of migration from the operator device:
cd $PUNCHPLATFORM_CONF_DIR
git pull
git tag -a v<NEW_VERSION> -m "end of migration to <NEW_VERSION>"
git push --tags
Announce the end of the patching to all stakeholders.
5. Rollback procedure¶
Remove the patch directory on target nodes. Depending on what you have patched (operator, shiva, gateway, storm, spark ..) execute :
punchplatform-deployer.sh --ssh punchplatform_operator_servers "rm -rf {{install_root}}/{{binaries_version}}/patch"
punchplatform-deployer.sh --ssh shiva_servers "rm -rf {{install_root}}/{{binaries_version}}/patch"
punchplatform-deployer.sh --ssh gateway_servers "rm -rf {{install_root}}/{{binaries_version}}/patch"
punchplatform-deployer.sh --ssh storm_servers "rm -rf {{install_root}}/{{binaries_version}}/patch"
punchplatform-deployer.sh --ssh spark_servers "rm -rf {{install_root}}/{{binaries_version}}/patch"
Then update patch directory on the deployment device with the previous patches or remove existing patches and redeploy :
punchplatform-deployer.sh --deploy -Kk -t patch -l punchplatform_operator_servers
punchplatform-deployer.sh --deploy -Kk -t patch -l shiva_servers
punchplatform-deployer.sh --deploy -Kk -t patch -l gateway_servers
punchplatform-deployer.sh --deploy -Kk -t patch -l storm_servers
punchplatform-deployer.sh --deploy -Kk -t patch -l spark_servers
Patching role and/or inventory template¶
In this example, we will try to patch the kafka role and inventory template, to see how easy it is to patch them. PFS team will deliver you a zip archive containing the following modified files :
deployer_patch/
├── inventory_templates
│ ├── group_vars
│ │ └── kafka_servers.group_vars.j2
│ └── punchplatform-deployment-inventory-template.j2
├── pp_roles
│ └── kafka-setup
│ ├── meta
│ │ └── main.yml
│ ├── tasks
│ │ └── main.yml
│ └── vars
│ └── main.yml
└── README_2022-09-07T15:42:46+02:00_PATCH_PUN-XXXX_kibana_issues_b466ef1b96
Unzip deployer patch¶
Just unzip the provided patch zip in your punchplatform deployer root dir. It will create a deployer_patch directory at the root level of your deployer (at the same level as pp-roles and inventory_templates directory).
# in your target/patched_opensearch zip
unzip -d /data/punch-deployer-x.y.z/ patched_PUN-XXXX_opensearch_issues-x.y.z-SNAPSHOT-b466ef1b96.zip
!!! Be aware that this unzip command will merge patched roles and inventory templates with previous the patchs.
your deployer is now patched! You'll find in your deployer_patch directory all README files indicating the scope of the patch. Please let this README file inside your deployer it will be usefull to identify the patch applies.
How to check the patch is active¶
If the patch was build by the PFS team you will notice :
- You should notice a warning message for each template_inventory file patched when using the command punchplatform-deployer.sh --generate-inventory
punchplatform-deployer.sh --generate-inventory
LOG: checking your dependencies...
LOG: all the required packages are installed !
Warning: generating global inventory from patched template '/data/punch-deployer-6.4.5/deployer_patch/inventory_templates/punchplatform-deployment-inventory-template.j2'.
LOG: audit the punchplatform configuration...
LOG: found punchplatform-deployment.settings: /home/bertrand/Documents/PUNCH-6.4.X/punchbox/punch/build/pp-conf/punchplatform-deployment.settings
LOG: generating deployment configuration from punchplatform.properties + punchplatform-deployment.settings...
LOG: Following Ansible "groups" and "tags" can be used as additional deployment parameters:
LOG: $ punchplatform-deployer.sh --deploy [-Kk] [-l group1,group2] [-t tag1,tag2]
- group: punchplatform_cluster tag: (no tag)
- group: (no group) tag: operator
- group: shiva_servers tag: shiva
- group: clickhouse_servers tag: clickhouse
- group: elasticsearch_servers tag: elasticsearch
- group: gateway_servers tag: gateway
- group: kafka_servers tag: kafka (patched by /data/build/punch-deployer-6.4.5/deployer_patch/inventory_templates/group_vars/kafka_servers.group_vars.j2)
- group: kibana_servers tag: kibana
- group: metricbeat_servers tag: metricbeat (patched by /data/punch-deployer-6.4.5/deployer_patch/inventory_templates/group_vars/metricbeat_servers.group_vars.j2)
- group: minio_servers tag: minio
- group: (no group) tag: operator
- group: shiva_servers tag: shiva
- group: spark_servers tag: spark
- group: storm_servers tag: storm
- group: zookeeper_servers tag: zookeeper
LOG: start punchplatform deployment...
LOG: using the additional paramters: -t gateway -u vagrant
LOG: SSH user: bertrand
INFO: using patched playbook file '/data/build/punch-deployer-6.4.5/deployer_patch/deploy-punchplatform-production-cluster.yml'.
- For each patched role you will have one unique pause message for all the servers involved by this role.
Pausing for 60 seconds
(ctrl+C then 'C' = continue early, ctrl+C then 'A' = abort)
[kafka-setup : pausing for patch message]
Patch=PUN-XXXX_kafka_issues, git_branch=1759-patch-procedure-for-deployer-roles, commit=b466ef1b96, commit_date=Wed Sep 7 15:42:46 2022 +0200:
Press 'C' to continue the play or 'A' to abort
Create a patch¶
Check this documentation if you want to create a patch