Why Gaming Hyperclouds Need AI for Dynamic Scalability


The fast-evolving demand in gaming requires the cloud to scale up and down dynamically and in real time. Complexity, human errors and comprehensive QoS/QoE policies can make or break the future of gaming and determine gaming platform winners. Artificial intelligence combined with machine learning is how gaming clouds will be administered in the near future and testing strategies must be in place.

Managing game clouds is getting so complex that they’re already pushing the limits of traditional business cloud and data center management capabilities. The gaming market is growing exponentially and new platform features like the metaverse and AR/VR headsets have potential to catapult it even further.

It’s time to get ready for even more complex and stringent game cloud management.

Game clouds are large with many dissociated moving parts. They are both geographically and structurally diverse. Importantly, they need to expand and contract in real time to match user needs.

The difficulty is that managing these dynamic clouds can require configuration of tens of protocols and hundreds of settings. That’s a lot of manual work, all vulnerable to inadvertent errors that we’ve seen lead to outages costing businesses tens of millions of dollars. Not to mention the high-profile negative publicity that accompanies these moments.

Gaming quality of service and experience (QoS/QoE) requirements are no longer just a question of bandwidth. As we discussed in our last post, every millisecond of travel time matters now. Add 4K, AR/VR’s 8K frame sizes and 8K adaptive bitrate streaming to the mix and latency, jitter, frame rates, and symmetric uplink and downlink performance become more important than ever.

Addressing complexity, human error and QoS service-level agreements (SLAs) requires the cloud to scale up and down dynamically and in real time. Geographical spread and coverage demand the gaming cloud scale its own network and reach out to peering partners when additional sources are needed. This must all be done in a matter of seconds, as complex and exacting SLAs hang in the balance.

This trifecta of issues—complexity, human errors and comprehensive QoS/QoE policies—can make or break the future of gaming and determine the success of some gaming platform providers over others.

AI brings power, speed, and intelligence to game cloud management

Artificial intelligence (AI) combined with machine learning (ML) are well-positioned to handle gaming cloud challenges. ML is the workhorse of AI, creating the neural nets and prediction models that are then fed into AI systems. AI is the arbiter of the data and takes industry requirements and other factors into consideration to create the master output. This is how gaming clouds will be administered in the not-so-distant future.

A sample model of AI/ML admin utility in gaming.

We see four key priorities on the emerging AI/ML front:

1. Secure resources to meet expected needs

The objective is to ensure just enough — but not too many — resources are available to handle the QoS/QoE policy load.

Picture this scenario: 66,000 participants are projected to attend a trade show like the E3 video game industry event. AI will know there is a high probability extra resources will be required in that region and will allocate the compute storage and network needed to handle the predicted traffic.

Game clouds may have ML nodes in every data center to analyze quality of experience, making prediction models about expected traffic requirements. That data is then fed into the master AI which then determines the next best course of action. Are there enough resources to fill the need? If not, does a peering relationship need to be executed with a partner to add resources in real time and execute what is needed?

2. Dynamic, real-time configuration

Dynamic configuration and resource expansion and contraction will be handled automatically by AI and ML. Based on ML algorithms, AI will calculate the quality of experience expected by the gamer, predict what’s needed to achieve that and automatically configure the exact resources required. AI will do this in seconds to milliseconds, not the minutes to hours it would take to accomplish the same process manually. Best of all, the risk of misconfiguration or security gaps is virtually eliminated.

3. Keep ahead of hackers

Especially with the advent of gaming metaverse ecommerce, AI and ML will be critical to ensure the financial security and privacy of the gaming cloud.

On the dark side, however, hackers will use AI to discover and learn from user patterns to determine where they should insert malware. In response, AI will also be used to defend the gaming cloud by implementing cloud security policy decisions at a hyper rate to rapidly close threat windows. ML will learn patterns that could potentially be used to exploit the network and then build defenses into the architecture.

Since it is reactive in nature, human-administered networks will always be more insecure than those administered with AI by virtue of its predictive nature. For now, AI and people can cooperate, with AI performing continuous intelligence scans whenever a human administrator is changing policies.

4. Unlock green benefits

AI and ML also have a positive impact on green initiatives. AI can predict when resources are no longer needed. It would then preemptively deallocate and shut down the resources, thus saving power, cooling, and other environment factors, thereby lowering administrative and operational costs.

Automated AI/ML game cloud administration will not happen overnight. It is currently deployed and element-oriented, such as in firewalls. In its next generation, AI will act as an advisor to humans. The full benefit will come when AI provides autonomous real-time administration.

Can we trust AI/ML?

Using AI/ML to administer networks is a fundamental change and, even though it is not in full bloom yet, it is coming quickly. So now is the time to start testing.

Test plans should answer questions like these:

  • Is AI predicting correctly? Testing can emulate traffic load changes over time and measure whether the AI system correctly predicts the changes. In addition, tests should measure how long the prediction model takes to get to 99.999% accuracy.

  • Does AI execute properly? Because the AI can be placed in an isolation sandbox, regression testing can make sure it’s behaving consistently, scales correctly etc. for different traffic situations.

  • How do different vendor AI systems compare? Different vendors will use different technologies to develop their AI product. Vendor-neutral testing can determine how they stack up.

  • Are AI-proposed security configurations doing their job? Test models can verify whether AI blocks bad traffic and does not impact good traffic.

Testing can also play an important role in training AI systems before they go into the live network by emulating hundreds of situational permutations.

While gaming and the metaverse represent the first use cases that need AI/ML cloud administration, in the future AI/ML will be ubiquitous, delivering benefits to any cloud.

Learn more about Spirent’s cloud and virtualization test and assurance and 5G Network Benchmarking for Cloud Gaming solutions.




Chris Chapman

Senior Methodologist, Spirent

With over 20 years in Telecommunications and 11+ years of network performance theory, Chris has extensive knowledge in testing and deployment of L1-7 network systems. His expertise includes performance analysis of QoS, QoE, TCP, IP (v4 and v6), UDP, QoE, HTTP(S), FTP, WAN acceleration, BGP, OSPF, IS-IS. MPLS, LDP, RSVP, VPLS, firewalls and load balancers. His specialties are centered on testing L1-7.