Spirentサークルロゴ
クラウドと仮想化

Why Confidence in 5G Cloud Performance Begins with Resilience

:

In 5G cloud environments where underlying architectures are more dynamic than ever, it’s critical to evaluate not just each function, but also understand its vulnerabilities based on every possible change in the network. Learn how Spirent has identified the critical areas where lab and pre-production testing techniques must evolve to ensure resilience in the production network.

Organizations are striving to migrate to the cloud as they deploy 5G, as network vendors like Cisco, Nokia and Ericsson are coming up with a host of Cloud-native Network Functions (CNFs). The hyperscalers are wooing the network operators as they try to move into the communications space. Operators are carefully considering who they will partner with as they face the challenge of getting new business models in place within 5G environments, where underlying architectures are more dynamic than ever. This means learning how to effectively deploy and integrate disparate 5G functions in the cloud.

In the old days of 3G and 4G, operators would buy monolithic appliances that might account for 40 or 50 functional components, essentially comprising the network stack. In 5G environments, these functions are going cloud native. The moving parts of what originally made up the stack have undergone a massive decoupling of network functions. The stack has become decentralized and service providers are scrambling to make them all work together to deliver superior Quality of Experience (QoE).

Pain points have been largely unrelenting. The latest we’ve been talking about extensively with customers is the unpredictable nature of highly fragmented cloud environments supporting emerging 5G core deployments. Specifically, the production issues and failures lurking are in large measure a result of the dizzying dynamic nature of these environments.

Case in point: in 5G networks, all network functions are being containerized and deployed in decentralized environments split between edge and core. Each network function consists of multiple pods and nodes requiring security protection and Kubernetes management, representing new challenges creeping into the cloud-native environment. In the past, testing these functions was relatively straightforward. Now, there are so many moving parts that it’s become critical to evaluate not just each function, but also understand its vulnerabilities based on every possible change in the network.

引用文

Now, there are so many moving parts that it’s become critical to evaluate not just each function, but also understand its vulnerabilities based on every possible change in the network.

Say a network function has 40 different pods or nodes: the application pod, database pod, management pod, etc. All must talk together. Internally, they all assume certain latencies, after which they will time out. With an end-to-end flow of 20 milliseconds, the flow between nodes must be less than a millisecond. In real-world networks, many things can go wrong. A noisy neighbor. Pressure on the network and more. One millisecond can change to two milliseconds in talking to another node, which can ripple out in effect. This can result in timing out, lost packets or dropped calls. For an operator required to meet certain SLAs, this is a serious concern. Myriad other examples of possible service failure exist in the new CNF world.

The bottom line: before any individual function can go live in the production network, comprehensive resilience testing must first be conducted. The alternative is to risk poor performance or even complete service failure.

The rise of mass decentralization represents a new reality

In less than a decade, we’ve gone from large, monolithic boxes to 40 or 50 moving parts supporting just one network function. These functions are deployed networkwide in a range of environments and locations, atop architecture that is also highly dynamic and widely distributed, from the edge all the way back to the core. They now consist of multiple pods and nodes, each installed on Kubernetes clusters that can be deployed in a variety of manners.

Leading 5G CNF vendors have microservices architectures with many pods/functions deployed with Kubernetes.

What does this mean in reality? That 5G cloud-native network function architectures require a new resilience, scalability, security, and performance paradigm in testing to assure customer QoE.

Emerging and established operators alike are grappling with this reality, as are the network equipment manufacturers that support them. They’ve long recognized the importance of running common tests around conformance, functionality, and performance. The difference this time is that testing isn’t taking place in a stable environment, meaning much is left to trial and error. Not to mention the enormous amount of time required to jump through so many new complex hoops.

The scale and depth of this evolution makes it infeasible to apply old approaches to this new paradigm.

Advancing testing techniques to ensure robust resilience

Spirent has identified the following critical areas where lab and pre-production testing techniques must evolve to ensure resilience in the production network:

  • Introduce new impairments. Pod failures, resource contention and latency within a 5G cloud-native microservices architecture must all be tested in an automated fashion. Injecting cloud-native impairments inside and in between network functions will help uncover vulnerabilities.

  • Key failure indicator (KFI) assessments. Operators must understand KFIs for each 5G cloud-native network function, based on automated cross-analysis and correlation summaries that can be referenced in production deployments via a service assurance platform.

It’s all about understanding what each application and function needs to succeed, then identifying every possible point of failure. Some failures can be addressed permanently before they hit the production network as part of a customer’s existing CI/CD/CT (continuous integration/continuous delivery/continuous testing) pipeline. Others will be inevitable, always lurking, hence the need for automated continuous testing.

Success is efficiently entering production with no surprises. Operators know what to watch, understand how failures will impact the network and service delivery, and take immediate, planned action the moment an issue begins to arise. In the example cited above, if the flow between pods exceeds 1.5 milliseconds, an alarm should be generated, so that appropriate automated scaling can take place to avoid service degradation or failure.

The latest advancements for 5G network function testing are all about being proactive about resilience, with the right capabilities, in new network environments defined by decentralization and containerization. It is the only way to ensure today’s networks remain as bulletproof as those that came before, no matter how complex they become.

To learn more, check out Spirent's Cloud Testing and Virtual Infrastructure Validation solutions or read the Assuring The Promise of Hybrid Cloud Services eBook.

Like our content?

Subscribe to our blogs here.

Blog Newsletter Subscription

Aloke Tusnial
Aloke Tusnial

Vice President, Cloud Business & Solutions, Spirent

Aloke, a Telecom veteran with more than 20 years of experience, drives the Cloud business at Spirent. Prior to Spirent, he was the CTO at Netcracker, where he bootstrapped and built a thriving business focused on Software Defined Networking (SDN) and Network Function Virtualization (NFV) from the ground up. He was responsible for leading the product definition, sales strategy and customer engagement for the SDN and NFV initiatives of the company. Prior to Netcracker, he has held a variety of senior positions in presales, architecture, strategy and account management roles, serving a global set of customers focused on real-time B/OSS.