NFV Service Assurance In Need of Big Data, Small Data or Both?
Sandra O'Boyle, Senior Analyst CEM & Customer Analytics, Heavy Reading
Operating NFV-based services and networks is the next priority for progressive communications service providers (CSPs). And it's not easy. In fact, CSPs are telling us that too much time and effort has been spent on VNF onboarding and far too little time and investment on the reality of operations in NFV environments.
NFV is shifting from a technology focus to operations, from "How do we do this?" to "How do we operate this?" A key challenge is integrating NFV network and service management into existing operations so that CSPs can run current networks and NFV cloud networks efficiently together.
How do we run NFV with existing networks and services? How do we assure on-demand services? And how do we offer dynamic customer SLAs? Without these answers, service providers cannot commercialize services and make the business case for moving to NFV and cloud networks.
The risk of not having the answers is that NFV will become a silo, a nice lab project that ends up being sidelined and not having any real impact on the bottom line. Ironically, the main driver behind NFV and SDN is a critical business-oriented one for service providers -- the need to launch personalized services quickly and be able to operate standardized, scalable networks.
More than 100 network operators and service providers worldwide participated in Heavy Reading's NFV Service Assurance and Analytics research study completed in the fourth quarter of 2017.
CSPs across the board say they are grappling with operationalizing NFV. This is not helped by organizational struggles including: internal knowledge and software skills gaps; differences of opinion between network and IT teams on new requirements and how to fill gaps in existing IT systems; as well as a lack of clear industry direction on what's required for NFV service assurance.
Nearly every issue around service assurance is rated a massive/significant challenge by at least 40% of CSPs.
The top five challenges rated "massive" or "significant" by CSPs in operationalizing NFV include:
|Assuring performance of multi-vendor VNFs||59%|
|Offering dynamic SLAs||58%|
|Integration/API issues between OSS and MANO||57%|
|Handling volume of data from VNFs in real time||54%|
|Assuring hybrid networks in common platform||52%|
- "Where we are today is that service assurance is a typical OSS IT function. It's offline, traditional, slow and not living up to the level where it needs to be, both in terms of automated processes or from a self-optimizing network on the radio side. What we are looking at is how to make this real-time on a granular level, which allows us to follow the sessions and respond and close the loop before a customer is impacted. We see this fitting in with MANO and the Orchestrator but still need a clearer picture on how it all works." -- Tier 1 European service provider
For CSPs that are already deploying NFV in live networks, key challenges include managing interoperability and performance across multiple VNF vendors. The traditional interfaces -- mobile signaling, management systems, and element managers (EMS)/configuration, for example -- are lagging behind. At the cloud layer, there is a VIM manager that handles performance of individual VNFs. However, there's a gap in implementation when CSPs combine different vendors' VNFs to deliver a customer-facing service that needs to be provisioned, assured and monitored for end-to-end delivered service quality. CSPs need centralized platforms with real-time actionable data to proactively manage multiple network and services layers.
This is becoming a real service assurance issue with universal CPE platforms, where operators need to move beyond single vendor SD-WAN VNFs to deploy multiple lightweight VNFs -- firewall, IP PBX, load balancer, application acceleration -- from different vendors.
In this case, CSPs see active testing and monitoring as essential for managing service quality, troubleshooting customer issues and assuring that services work accurately after provisioning or after service reconfiguration, as well as meeting dynamic SLAs.
Service providers also want active virtual probes or test agents to be lightweight with small CPU and memory footprint that can be containerized, so that active testing can to be done in a very non-intrusive manner without interrupting real-time traffic. CSPs tell us that once you deploy a VNF, you need a highly-automated virtual circle or lifecycle, from order management to assurance to re-fulfillment. This has to happen in a very orderly, automated fashion and be a very well-oiled engine without any noticeable disruption to the end user.
Active testing is also rated as highly valuable by 62% of CSPs, especially when automated and driven by an NFV orchestrator. There are a number of providers -- Netrounds, for example -- offering orchestrated and closed loop assurance with APIs.
This reflects the industry's eagerness to increase programmability and automation of networks (see Image 1 below). Service providers also want service orchestration to drive automation and process improvement with as little manual intervention as possible to deliver the service, as human intervention is the main source of outages and service or configuration problems.
Dr. Stefan Vallin, Director of Product Strategy for Netrounds argues in his recent paper, "Service Assurance In Need of Big Data or Small Data?," that data from active testing and monitoring yields detailed, real-time service KPIs, which can be referred to as "small data." This data provides great value by itself, but it is also an enabler for the successful application of big data and AI. Small data obtained from active testing and monitoring directly answers many of the most important service assurance questions, such as, "Are we meeting the level of service quality that we promised?"
Vallin goes on to make the following point: "If you can measure service quality directly, why would you try to reverse engineer it from noisy and incomplete data pulled from the resource layer? We traditionally have low-quality data from the resource layers -- we should put stronger requirements on devices to provide less data that is of higher quality and yields better answers to the questions being asked."
For customer-centric service assurance, service providers need to visualize their end-to-end services, be able to prioritize issues and avoid faults that impact customers, and reduce meaningless data overload. CSPs that focus on selling enterprise services in particular are concerned they risk drowning in data and buckets of alarms that are not correctly prioritized based on customer or service impact. They expect that, in the future, they should stop caring about device alarms in real time and only care about service alarms and monitoring of the service, since the MANO would handle policy/re-route decisions. They would then use big data analytics to correlate across layers for troubleshooting and forensics on recurring faults and root cause analysis -- network failure, memory/CPU issues, fault correlation -- rather than real-time service assurance.
It's also about managing faults in a more efficient way, moving away from manual trouble ticketing systems and non time-sensitive alarms, and focusing on having network domains remediate and fix their own issues, while only pushing up service-level faults that need to be prevented or fixed right away, thereby automating fixes for customer-impacting problems.
- "The NFV environment needs to restore customer service itself and be service-aware, manage the customer's service and be responsible for the service." Tier 1 Asia Pacific service provider
In summary, service providers need both big data and small data to effectively operate NFV networks. There's clearly a role in Service Assurance for Big Data analytics, machine learning and AI with the caveat that it's based on relevant and high-quality data input rather than massive amounts of low-level data from resource layers. To this end, service providers are starting to work with VNF vendors to manage the volume of data from VNFs in real time and put stronger requirements on devices to provide less data that is of higher quality and that yields better answers to the questions being asked in Image 2 (above).
Higher quality data will help to train algorithms to become more sophisticated at predicting what's going to happen to prevent outages or degraded services.
The second takeaway is to adopt new data sources that can measure actual delivered service quality in real time and from the customer perspective -- the aforementioned "small data" that can directly provide CSPs with relevant service KPIs. This is the preferable way to understand the service quality and experience in the eyes of the customer, rather than trying to use resource data to generate service KPIs at a higher layer based on information from a lower layer. This is really important when it comes to configuring services, rolling back services quickly, but also being able to "take the pulse" of the customer experience in real time at key moments or on an ongoing basis.
This blog is sponsored by Netrounds. Read Stefan Vallin's full paper here: "Service Assurance In Need of Big Data or Small Data?"
— Sandra O'Boyle, Senior Analyst, Heavy Reading