Taming the Monolith: A Blue/Green Strategy for Upgrading Legacy Systems

By Chris Hall, Senior Consultant at Nimble Approach.

This solution is implemented using AWS, provisioned with Terraform, and configured via Ansible – but the principles broadly apply.

Many organisations rely on legacy services. These systems are often business-critical, with companies relying heavily on them. Since they are generally working, they receive little attention, leading to the teams that manage them being unfamiliar with their inner workings.

Sooner or later, a moment arrives when everything needs to be upgraded at once: the instance type, the data volume, the operating system, and the service version itself. This creates a high-stakes scenario.

Here’s an example of a recent situation where we needed to carry out such an upgrade for a customer on AWS. The system in question is provisioned and configured using Terraform and Ansible.

As you read on, see if you can work out which critical system was involved.

EC2 service — *Mysterious Legacy Service*

The Problem: Live Upgrades are Hard

Performing these upgrades on these kinds of live, business-critical instances are a challenge. There are often no development or test environments to practise on. Furthermore, upgrading the service package can have unknown effects on its persistent data.

While the existing codebase, using tools like Terraform and Ansible, may be functional for day-to-day operations, it often doesn’t support a safe way to carry out a major upgrade.

The system we’re working with here involves a client that relies on JNLP to interact with the instance being upgraded. This is an important point to note later!

Ideation: Finding a Path Forward

To reduce risk, a blue-green release strategy was chosen. This approach involves running two parallel production services: “green” for the legacy service and “blue” for the new one. It provides a method to test the new service in parallel and then switch live traffic over to it when ready.

One option considered was to run parallel instances of just the component being upgraded. The challenge then is deciding how to direct traffic between the two versions.

While using a custom HTTP header to route requests during the upgrade process might seem like a potential solution, this approach was ultimately not viable. AWS Application Load Balancers (ALBs) do not support routing based on arbitrary custom headers without explicit client-side configuration. More critically however, this method would not work for non-HTTP traffic, which operates at a lower network layer and is outside the scope of what ALBs can handle.

The better approach was to duplicate the entire stack. This leaves the current live service untouched while keeping complexity to a minimum. This path allows for refactoring the existing code to support two stacks and enable toggling between them.

The Solution: A Modern Approach with Infrastructure as Code

The first step was to refactor the existing Terraform code into a reusable module. Using the count meta-argument at the module level provides the ability to dynamically create and destroy entire stacks as needed.

The toggle itself is controlled by a variable within Terraform. This determines the DNS record and certificate, allowing the live state to be switched between the green and blue stacks. This toggling process takes only about four minutes to plan and apply.

To ensure safety, a validation condition was added to the Terraform variables, making it impossible for both stacks to be live at the same time.

Ansible was also enhanced. By adding an IsLive tag to the infrastructure, Ansible playbooks could safely target the non-live instance by default. Applying configuration to the live service became a deliberate action, rather than an accident.

Toggle Time!

Once the new blue service was tested and ready, a pull request was merged and applied with Terraform to make the switch.

After the toggle was complete and the new service was confirmed to be stable, the old green environment was decommissioned by changing a single variable in Terraform.

This immediately helps reduce operating costs.

Best of all, this entire process is repeatable. The next time an upgrade is needed, a new green stack can be spun up and the pattern can be run again.

This robust pattern was used to upgrade a common but often mysterious service that many organisations rely on: Jenkins

*Post-cutover with a shiny new version of Jenkins*

Author’s Bio

Chris Hall is a Senior Consultant at Nimble Approach, with many years of experience as an amazing Platform Engineer. If you’ve had the pleasure of working with Chris, then you know he leaves any team, system, or business in a better place than it was before.

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie records the user consent for the cookies in the "Advertisement" category.
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	CookieYes sets this cookie to record the default button state of the corresponding category and the status of CCPA. It works only in coordination with the primary cookie.
rc::a	never	This cookie is set by the Google recaptcha service to identify bots to protect the website against malicious spam attacks.
rc::b	session	This cookie is set by the Google recaptcha service to identify bots to protect the website against malicious spam attacks.
rc::c	session	This cookie is set by the Google recaptcha service to identify bots to protect the website against malicious spam attacks.
rc::f	never	This cookie is set by the Google recaptcha service to identify bots to protect the website against malicious spam attacks.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
_GRECAPTCHA	6 months	Google Recaptcha service sets this cookie to identify bots to protect the website against malicious spam attacks.

Cookie	Duration	Description
yt-player-bandwidth	never	The yt-player-bandwidth cookie is used to store the user's video player preferences and settings, particularly related to bandwidth and streaming quality on YouTube.
yt-player-headers-readable	never	The yt-player-headers-readable cookie is used by YouTube to store user preferences related to video playback and interface, enhancing the user's viewing experience.
yt-remote-cast-available	session	The yt-remote-cast-available cookie is used to store the user's preferences regarding whether casting is available on their YouTube video player.
yt-remote-cast-installed	session	The yt-remote-cast-installed cookie is used to store the user's video player preferences using embedded YouTube video.
yt-remote-connected-devices	never	YouTube sets this cookie to store the user's video preferences using embedded YouTube videos.
yt-remote-device-id	never	YouTube sets this cookie to store the user's video preferences using embedded YouTube videos.
yt-remote-fast-check-period	session	The yt-remote-fast-check-period cookie is used by YouTube to store the user's video player preferences for embedded YouTube videos.
yt-remote-session-app	session	The yt-remote-session-app cookie is used by YouTube to store user preferences and information about the interface of the embedded YouTube video player.
yt-remote-session-name	session	The yt-remote-session-name cookie is used by YouTube to store the user's video player preferences using embedded YouTube video.
ytidb::LAST_RESULT_ENTRY_KEY	never	The cookie ytidb::LAST_RESULT_ENTRY_KEY is used by YouTube to store the last search result entry that was clicked by the user. This information is used to improve the user experience by providing more relevant search results in the future.

Cookie	Duration	Description
_ga	1 year 1 month 4 days	Google Analytics sets this cookie to calculate visitor, session and campaign data and track site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognise unique visitors.
_ga_*	1 year 1 month 4 days	Google Analytics sets this cookie to store and count page views.

Cookie	Duration	Description
VISITOR_INFO1_LIVE	6 months	YouTube sets this cookie to measure bandwidth, determining whether the user gets the new or old player interface.
VISITOR_PRIVACY_METADATA	6 months	YouTube sets this cookie to store the user's cookie consent state for the current domain.
YSC	session	Youtube sets this cookie to track the views of embedded videos on Youtube pages.
yt.innertube::nextId	never	YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.

Taming the Monolith: A Blue/Green Strategy for Upgrading Legacy Systems

The Problem: Live Upgrades are Hard

Ideation: Finding a Path Forward

The Solution: A Modern Approach with Infrastructure as Code

Toggle Time!

Author’s Bio

Recent Posts

Blog Categories

Contact Us

What We Do

Previous PostHow to Benchmark Your Data Maturity and Drive Business Value

Next PostChoosing the right CMS: What tech leaders need to know to enable growth, scale and success

Menu

Contact

Taming the Monolith: A Blue/Green Strategy for Upgrading Legacy Systems

The Problem: Live Upgrades are Hard

Ideation: Finding a Path Forward

The Solution: A Modern Approach with Infrastructure as Code

Toggle Time!

Author’s Bio

Recent Posts

Blog Categories

Contact Us

What We Do

Previous PostHow to Benchmark Your Data Maturity and Drive Business Value

Next PostChoosing the right CMS: What tech leaders need to know to enable growth, scale and success

You May Also Like

Beyond the Hype: Helping Businesses Turn AI Experimentation into Real Impact

Break Down the Barriers to Value: Why the Product Operating Model is Key to Digital Success

How to Build Empowered Product Teams That Deliver

Menu

Contact