Track 4 Upgrade process¶

Abstract

This track covers the fundamentals of the upgrade procedure on a running platform

Release note¶

Before considering a migration, you should check each release note that separates your running version and the target version.

A release note explains quickly major changes embedded in the version. It is very important to consult it to have a functional overview of the changes made

For example, check the 6.2.0 Release Note to have an overview

Migration Guide¶

Each release note have an associated upgrade note which describes each changed in detail that you need to apply to your configuration before considering the migration

For example, check the Upgrade notes from 6.1 to 6.2

Migration procedure¶

We cannot write a migration procedure as we have done for the patching procedure because it would be too different between each platform, depending on the running applications and their SLA, and depending on the nature of the underlying open-source framework evolution..

Indeed, each platform is different, some of them cannot have a service interruption on their kafka clusters, others on their elasticsearch clusters and still others on both for example.

In this context, we must therefore work on a case-by-case basis with each client on the best procedure to adopt as it would be a nightmare to build a complete procedure suitable for all use cases at the same time. Some platform owner have sufficient internal skills and MCO team manpower to handle on their own minor updates or even full deployments. Most will request the Punch Professional services team for workshops for preparing these migration, and/or migration procedure writing and support for execution.

Of course, once the migration procedure has been finalized with the customer, it is not done manually but leverages the deployment tool and the associated Ansible inventory which offer many possibilities for remote and group actions on the platform.

Minor vs Major¶

All updates are about evolution in some software components. Minor ones are when the evolved components have enough compatibility to allow upgrading "in place" the software versions of one node at a time in each cluster, with little service interruption because the overall configuration and inter-frameworks compatibility is still ensured so as to require no simultaneous frameworks updates and/or pipelines configuration changes.

Major ones usually imply

preparing an updated version of the deployment and pipelines
a specific order for updating versions of frameworks
most often, a step where at least part of the processings have to be stopped to allow for consistent update- of several different components (e.g.: Apache Kafka + zookeeper + Punchplatform operator environment)

Major upgrade procedures have to be designed with a tradeof in mind: Service interruption duration vs Complexity of the overall upgrade procedure.

Typical major upgrade project¶

The following steps are often followed by big platform or product owners:

1-Requesting a workshop to intialize upgrade procedure design with customer and Punch PFS Team members. Goals: Identify constraints

important SLA to consider during the migration procedure (what are the truly critical functions where the platform owner is limited in downtime allocation).
possible upgrade scenarii (possibility to allocate temporarily additional VMs or servers during the upgrade)
existing data that must be preserved or migrated during the upgrade phase
Decisions to make for the customer, to leverage new features or to adapt its existing custom configuration to the new version
Skills available in customer team to execute upgrade, and level of associated support needed

2-(1 or 2 weeks delay) The Punch Pfs team identifies envisionned amount of preparatory activites. E.G.:

update steps identifying and synthesis
if needed, inter-versions compatibility assessment between components used on the platform
dryrun of unitary update steps on test platforms

3-(2 to 3 weeks) After customer review and approval, the preparation activites are conducted on both sides

platform owner: new or temporary infrastructure resources provisioning (OS updates, firewalling, VMs, storage...)
platform owner or Pfs team : prepare the existing configuration to east evolution (re-templating)
PFs team: compatibility tests and update steps sequencing documentation with Rationale
PFS team: prepare updated configuration, test updated channels template

4- (1 to 4 weeks) Software components updates on the chosen production platform(s) or integration platform

5- Optional additional configuration updates deployment for enhanced/new features, and associated monitoring impact on external monitoring system

6- If the customer team is in charge of upgrading target production platforms or boxed products based on upgrade dry-run with PFS team support on a factory platform, phase of review/support for the customer documenting of the custom production update procedure.

Complex vs Easy major upgrades¶

An upgrade is complex if it involves temporary/intermediate steps and scheduling

requiring deployment of multiple running versions of same technology at the same time
requiring progressive switches of responsability between previous and updated version
requiring many checks before production flow switching to reduce risks of interrupted service

An upgrade is much simpler and involve less risks if:

some additional resources are available for deploying and checking new versions of part of the system without immediate stop/removal of the previous version (easier rollback and less interruption)
maintainance downtime is allowed: leveraging multiple levels of retention, it is possible to keep storing input data while upgrading later processing stages without data loss, with real-time interruption of some services.