Delivering Seamless Experiences During Major Events - Scaling quickly and cost-effectively.

By Zak Catherall – Consultant Software Engineer at Nimble Approach

Big events get big traffic

Major tournaments and competitions play a significant role in the world of betting and gaming products. Events like the FIFA World Cup, the Super Bowl, and festivals like the Grand National attract hardcore fans and casual viewers alike. These major events incite a tremendous surge in user engagement, website visits and app usage.

This surge in users presents lucrative opportunities for operators of betting and gaming products, with the revenue generated often representing a significant part of yearly earnings.

For example, we’ve found Betting and Gaming clients website traffic can be more than 5x higher on event days like the Grand National, compared to a typical winter weekday. We’ve seen Cheltenham double that again with more than 70,000 page-hits per/min for one client.

However, despite the surge, users will engage with your products expecting that they will be able to view results, place bets, and follow coverage like any other day of the year.

Failing to deliver a seamless experience could mean your customers defer to competitors.

To meet the demands of these opportunities, operators must effectively manage the increased system load that record-breaking engagement brings, whilst balancing costs and other factors. However, the sparsity of such events and the sporadicity of traffic during them presents some unique challenges, and these “other factors” are sometimes neglected.

Challenges of scaling

Building scalability into systems is usually a priority but is especially crucial for gaming products. The structure of many sporting events translates to semi-predictable (or semi-unpredictable) load patterns. Systems must be able to scale up and down quickly in response to dynamic traffic patterns, without a disruption to services.

Generally, there is anticipation of a spike in traffic as users rush to check results at the end of events (e.g. a football match). Throughout events, we expect a sustained increase in traffic compared to a typical period. Yet, it is impractical to build systems that accommodate all scenarios. This impracticality is compounded by the varying schedules of tournaments and festivals.

There are also challenges around costs, maintainability, and the speed of delivery. Often, there will be tradeoffs depending on the technical strategy and maturity of the business, team size, and resource availability. Therefore, providers must carefully consider their vision and capabilities while adopting architectures and tools. Failing to do so can jeopardise the coverage of these significant events.

Scaling Solutions and Their Trade-offs

Ultimately, to maintain seamless experiences for your customers, we must mitigate traffic influxes. This begins with identifying what a typical day or normal traffic looks like for a given app or product. This understanding is essential for effectively managing costs and scaling resources to maintain availability.

Automatic scaling can be set up in advance of expected busy periods. However, this naïve approach comes with caveats. Overestimations can lead to unnecessary costs and underestimations can degrade user experience. Both scenarios negatively affect profit margins. A preferred approach combines proactive and reactive scaling, adjusting resources based on the load at any given time.
At Nimble Approach, we’ve seen particular success with the Amazon Web Services (AWS) ecosystem, using technology like AWS Lambda and EKS, and have fostered expertise with these technologies in the process.

Serverless – Success story

One example is a free-to-play game that runs over a specific major sporting event and is subject to a lot of the challenges mentioned above. We deliver this game over a 6 month period, in time for the sporting event. We needed to complete development quickly, accurately, and keep agility, scalability and cost-effectiveness at the fore-front throughout the project.

AWS Lambda’s capabilities were a perfect match in this case. The cold-start time for Lambdas tends to be under 100ms which was well within acceptable limits for our use case. Our serverless functions were also optimised so that less than 1% of invocations would ever cause a new Lambda to instantiate, even at peak times. These characteristics minimised the risks to services during high-profile events, ensuring seamless user journeys throughout.

The Lambdas Functions performed exceptionally well in our case for a number of reasons. The game we built was entirely client-facing and the biggest traffic peaks were largely predictable, centred around the timeline of the sporting event. Paired with DynamoDB and its ability to scale alongside our Serverless Functions and provide on-demand costing, we were able to achieve massive cost savings in low-traffic periods and scale up to meet the peak user volumes.

The move away from traditional storage methods was pivotal and introduced new challenges. Unlike conventional relational databases, data in DynamoDB is stored in a single flat map-like structure. Optimising the layout of your data to meet the needs of your application’s data access methods can vastly improve read/write times and result in reduced usage costs. For established products and services with a lot of pre-existing data, this optimisation can be a huge undertaking. For example, in Deliveroo’s case, this required their data to be entirely remodelled.

Serverless isn’t always the right answer, but when used in the correct scenarios it is a potent tool, owing to its vastly improved developer experience, theoretically infinite scalability, and reduced overheads. It is valuable to recognise that the agility of serverless functions also allows for phased or partial adoption. By gradually migrating away from legacy codebases, organisations can reduce costs, whilst generating buy-in for serverless and other cloud technologies.

Microservice – Success Story

Another example is a different series of games that operate year-round, including many different sporting events. We maintain these as a managed service, including all Testing, Dev-Ops, and on-call responsibilities. With AWS Managed Kubernetes Service (EKS) we can achieve a consistent 99%+ uptime over several years, whilst actively delivering new features and supporting major events.

Unlike serverless, EKS requires additional setup and maintenance to help streamline development. Having engineers who are well-versed in test automation and dev-ops practises is essential.

It is especially important in the context of EKS as AWS will release new versions 3 times a year and will generally drop support within 12 months. In smaller or more inexperienced teams, EKS maintenance may detract from development efforts. Given that the sporting events for our games occur year-round, this can be problematic if the teams are not proactive about keeping the underlying platform infrastructure up-to-date.

As part of the initial setup and continuous improvement, we created comprehensive local development environments and testing environments. These are managed by CI/CD pipelines that gated the path to production. Having a high degree of automation ensures our ability to deliver new features and respond to incidents rapidly, even during major events.

Kubernetes allows for granular control in most aspects of the platform and is especially effective at scaling. The level of control presents some overheads and operating effectively requires some specialist knowledge. Horizontal Pod Autoscalers (HPAs) and the built-in EKS cluster-autoscaler enable configured, horizontal scaling of both pods and nodes in the EKS cluster. These handle traffic spikes by creating additional resources as required. Like serverless, there is a time delay when these services are instantiating. We also find that scaling vertically ahead of exceptionally busy periods helps to maintain optimal performance.

We determined appropriate scaling configurations through comprehensive performance testing and monitoring. AWS Budgets, CloudWatch, Prometheus, and Grafana also informed our alerting mechanisms. Kubernetes’ self-healing properties often resolved issues before they became visible to users. However, interacting with 3rd party providers means issues aren’t always the fault of the managed system. It’s also very easy to rack up enormous costs by virtue of running microservices 24/7, so proper monitoring can inform better cost optimisations.

In our experience, the key to success with EKS is the investment in robust processes and the developer experience. The costs of finer control are additional overheads. Automating as many of these overheads as possible enables competitive time to market, cost savings, and developer efficiency. Teams can instead focus on delivering new features and enriching user experience, ensuring users choose your platform over competitors.

Maximising Performance for your customers

Regardless of the tools, patterns, and architectures you’re working with, most approaches can be made to work. Even if the journey is a little more arduous or expensive.

Platforms can maximise performance, availability, and resilience by leveraging a blend of caching, Content Delivery Networks (CDNs), Availability Zones (AZs), load balancing, horizontal scaling, and monitoring. These techniques work together to minimise response times and downtime and create seamless user experiences. It is advisable to continually evaluate and implement these methods to stay competitive.

Regardless of the tools you use, Nimble can work with you to help maximise performance and opportunities.

Such techniques are not new or ground-breaking but should be a focal point for delivering seamless experiences. Likewise, regular performance testing should be executed to assess system capabilities, identify weaknesses, and validate improvements. This is especially true, leading up to major events, with the prospect of record-breaking user numbers.

Hidden Costs and Secondary Considerations

When deciding on tools and architecture for a project, it’s easy to overlook secondary considerations such as developer experience, on-call supportability, and ops overheads – to name a few. Failing to consider these factors can lead to a domino effect of complications.

As an example, a poor developer experience can result in delays and lower-quality feature implementations. As a result, the time to market is stretched, costs are increased, and when the product is finally live, it runs with more disruptions and outages. Such issues can tarnish user experience, create additional expenses, and breed frustration within teams.

Dissatisfied teams may experience higher turnover, leading to a loss of expertise and continuity. Hiring new staff to fill these gaps can be costly and further delay progress, and contribute to the accumulation of technical debt. Technical debt which further hinders development, increases maintenance efforts, and incurs even more expense.

Summary

To deliver seamless experiences during major events, it is crucial to carefully consider the architectural and tool choices holistically. Each choice has the potential to lack benefits altogether, and the ability to scale appropriately directly impacts success and profitability.

Options with unmanageable overheads can significantly increase the resources required for ongoing support, bug fixes, and system upgrades. These activities can consume a substantial amount of development time, leading to higher operational costs and indirectly impacting the user experience. However, such overheads might be a compromise in facilitating more responsive, dynamic scaling, and better fault resilience required for delivering seamless experiences during key events.

To ensure success, it is important to proactively analyse various factors, including cost benefits, developer experience, performance, maintainability, and time to market. Failing to consider these aspects holistically and in the context of your team can result in choices that undermine their intended advantages.

At Nimble, we have extensive experience determining and delivering the most suitable solutions for betting and gaming products. As specialists in this domain, we can guide your teams to make informed choices from the start of projects or help you maximise the potential of existing products. With our expertise, we can ensure that your technical choices align with your specific needs and goals, setting the foundation for successful products and services.

Author Bio

Zak Catherall – Consultant Software Engineer

Zak is a Consultant Software Engineer at Nimble Approach with experience developing and maintaining a variety of betting and gaming products. He has a passion for all things tech and loves the people aspect that consultancy brings.

Speak to our Betting and Gaming specialist

Raj Kissy, Betting and Gaming Specialist

LinkedIn: linkedin.com/in/raj-kissy
Email: [email protected]

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Delivering Seamless Experiences During Major Events – Scaling quickly and cost-effectively.

Big events get big traffic

Challenges of scaling