Application Note: Best Practices for VoLTE Troubleshooting | NETSCOUT

Application Note: Best Practices for VoLTE Troubleshooting

Introduction

For service providers deploying LTE, the success of Voice over LTE (VoLTE) is a milestone to differentiate their HD-Audio/Video service from other over-the-top voice application/services such as iTalkBB, Phonepower, Skype and Viper. Getting VoLTE service right is very important, because it is the most basic service that subscribers expect. It is also the service where it is most noticeable if a mishap occurs. It is absolutely critical that it is done right the first time.

This whitepaper provides an overview to engineers and technicians deploying and maintaining VoLTE. It describes the key network elements and their role in VoLTE calls, and typical issues found during VoLTE deployment. It also offers best practices highlighting where to focus when troubleshooting VoLTE issues in live LTE networks.


Key Components for VoLTE

Evolve Packet Core (EPC): eNodeB, SGW, MME, PGW

These components work together to establish and maintain subscriber connectivity to the data network as the user equipment moves across the mobile network. One or more bearers need to be created through the IMS APN, and a special IP address should be assigned to the UE when making VoLTE calls.

IP Multi-media Subsystem (IMS)

It contains application servers, call session controllers and media control functions supporting inter-network calls and messaging, AAA and routing. SIP traffic will be routed to IMS after it leaves the EPC. The IMS determines where 3G/PSTN/EPC the callee resides in order to route the SIP and RTP traffic.

Session Border Management

Session Border Management is part of the IMS that enforces security, quality of service and admission/routing control between the EPC/IMS to other networks such as PSTN and 2/3G. It governs the manner in which VoIP sessions are initiated, conducted and terminated between the different types of networks' Media Switching Center/Media Gateway (MSC/MGW). It works with the Media Gateway to provide inter-networking support and effective management of media codec and signaling traffic between LTE to 2G/3G/PSTN. The Media Resource Function Processer (MRFP) that conducts the transcoding of voice codec between LTE and 3G/PSTN is a key element between the EPC and the 2G/3G/PSTN.

Understanding the Basic VoLTE call process

  1. Connecting to LTE

    UE needs to be authenticated and authorized by the MME so it too may be connected to the network.
  2. Connection and Registration to IMS Service

    (a) The UE requests data service with IMS to establish the default bearer. (a new IPv4 and/or IPv6 address will be assigned to the UE) (b) The UE then “register” with IMS to get “provisioned” so that calls can be directed to other VoLTE, 3G/2G or PSTN subscribers through the IMS.
  3. Making the Call

    When a UE initiates a call, it sends SIP “Invite” message over the default bearer to establish connection with the callee. The IMS receives the SIP message, and locates the callee (LTE/3G/PSTN) and establishes the connection. If the callee is on PSTN or a service provider’s network, routing of SIP and Media will be conducted via the SBC and transcoding of codec will be conducted with the MRF.
  4. Establishing Media Bearer

    The IMS instructs the PGW/APN to initiate establishment of the dedicated bearer to carry the voice packets over RTP and RTCP protocol streams. Based on the 3GPP standard, quality control index (QCI) level 1 should be assigned for the voice bearer. Finally, the RTP (voice conversation) is transferred over the dedicated bearer. The dedicated bearer will be deleted after the voice call.

QCI Resource Type Priority Packet Delay Budget Packet Error Loss Rate Example Services
1 GBR 2 100 ms 10-2 Conversational Voice
2 4 150 ms 10-3 Conversational Video (Live Streaming)
3 3 50 ms 10-3 Real Time Gaming
4 5 300 ms 10-6 Non-Conversational Video (Buffered Streaming)
5 Non-GBR 1 100 ms 10-6 IMS Signaling
6 6 300 ms 10-6 Video (Buffered Streaming) TCP-based (e.g., www, e-mail,chat, ftp, p2p file sharing, progressive video, etc.
7 7 100 ms 10-3 Voice, Video (Live Streaming), Interactive Gaming
8 8 300 ms 10-6 Video (Buffered Streaming) TCP-based (e.g., www, e-mail,chat, ftp, p2p file sharing, progressive video, etc.
9 9

Bearers

These are GTP-U based tunnels that are created to carry data traffic for the subscriber across the EPC. When a subscriber’s equipment is connected to the network and establishes connections to the data services, via PDN-Gateways (PGWs), default bearer(s) are created to carry the base communication protocol for the data service. Two types of bearers may be created, Guaranteed Bit Rate (GBR) or non-Guaranteed Bit Rate (nGBR). GBR bearers are assigned guaranteed bandwidth to carry jitter and packet-drop sensitive traffic such as voice over RTP. Voice carrying GBR is resource consuming and is created when a VoLTE call is successful and is deleted as soon as the call ends. nGBR is typically created for normal data traffic such as Internet traffic that is base-effort. Most default bearers, (such as that for VoLTE where SIP traffic flows over or non-critical Internet service) are nGBR.

Quality Control Identifier (QCI):

It indicates QoS parameters (packet delay and loss budget) as well as the priority class for each bearer. QCI’s assignment is based on the subscriber’s profile in HSS and the data service provisioned by the service provider. Although 3GPP offers 9 suggested QCI values as reference, service providers can assign their own QCI to data service.

VoLTE Challenges

  1. High Traffic Volume

    All VoLTE traffic is IP based. Call signaling is based on TCP/SIP and audio is carried over UDP/RTP with AMR-WB as the audio codec. These VoLTE IP flows will be buried among all other IP data traffic flows in the LTE core, including streaming video and Internet traffic.

  2. Different Paths

    When a VoLTE call is made, control signals that build the data bearers go through different data paths than the media traffic. In addition, SIP signaling and media traffic also go through different path and network elements after leaving the EPC. Troubleshooting VoLTE call setup and quality issues requires visibility and correlation between the control signal and the user bearer created.

  3. Segment-to-segment Visibility

    QoS of the audio is assured by dynamically creating a dedicated bearer within multiple interfaces in the EPC. End-to-end root cause analysis of audio quality issues requires correlation and visibility to QCI parameters established across multiple segments.

  4. Asymmetric Media Flows Visibility

    A different VLAN may be assigned to each direction of the RTP flow in and around the EPC. Engineers must be able to correlate SIP and RTP flows and extract packets despite this asymmetric nature to have visibility to all RTP packets. When troubleshooting for abnormal VoLTE call drop, engineer will need to extract the packet to analyze the timing and behavior of the RTP payload.

Best Practices for VoLTE Troubleshooting

  1. First, connect and capture traffic to gain visibility of traffic across c-plane and u-plane interfaces. Typically, an aggregation switch such as one from VSS Monitoring or Brocade can filter, aggregate and load-balance traffic to the tool with very low latency.

  2. From the traffic captured, analyze the VoLTE related bearer setup and QoS parameters provisioned.

  3. Analyze SIP flows across the EPC to IMS and SBC for calls to 2G/3G/PSTN.

  4. Track RTP flows correlated to calls made across interfaces in the EPC and across MRFP and SBC for calls to CDMA/PSTN.

Network Time Machine offers high performance packet capture capability up to 20 Gbps with postcapture analysis that correlates c-plane to u-plane and SIP to RTP, so the user may go back-in-time to conduct root cause analysis of VoLTE issues end-to-end.

Time Synchronization Consideration

All of the analysis, correlation of c-plane to u-plane, and SIP to RTP traffic, can be difficult if all the traffic is not captured on the same device where timestamps of all traffic are synchronized. When multiple capture devices are used, the timestamp mechanism of the capture devices must be synchronized using external NTP or PTP/GPS clock sources. As an alternative, advanced aggregation switches, such as those from VSS Monitoring, accept external clock sources and can add timestamps to the trailer of the packet. Capture devices that use the timestamp provided to reconstruct the packet when exporting for correlated analysis, will make life much easier for engineers.

Scenario 1

The user cannot make calls at all.

  • Default bearer was not set up
  • User did not register with IMS (Authentication issue, IMS overloaded)

For these issues, connect to the S1 and S11 interfaces to examine the initial connection and bearer setup process of the UE, as well as whether or not there are SIP flows from UE. An analyzer, such as Network Time Machine, can capture all traffic at S1 and S11, up to 20Gbps, select the UE of interest and show the default bearer and dedicated bearer setup procedure with IMS.

Scenario 2

Why a user cannot make calls to PSTN users?

With the support of IMS, calls from VoLTE subscribers can be routed to PSTN. When this fails, SIP flows from each segment, from end-to-end, need to be captured and examined to determine if and where the call setup process failed. For engineers responsible for the EPC, it is important to capture the signaling traffic around the demarcation point between the IMS and the Regional Core, i.e. between the PGW and the IMS, around the SBC and across the MRFP for media conversion. The error and delay in call setup time should be noted, and the payload of packets examined via root cause analysis from the origin of the failure.

During troubleshooting, look for the SIP flow that exhibited failure. SIP cause-code will provide a hint of why a call failed, such as a 503 Service Not Available; it may mean that either a service is not available because of overload or is misconfigured.

Network Time Machine can analyze the SIP traffic captured and offer a statistic of SIP errors and the calls that trigger the error.