Mean Opinion Score and Metrics

Microsoft Office Communications Server 2007 and Microsoft Office Communications Server 2007 R2 will reach end of support on January 9, 2018. To stay supported, you will need to upgrade. For more information, see Resources to help you upgrade your Office 2007 servers and clients.

The UC solution reports several different measures of voice quality to monitor the quality of the experience that is being delivered to end users. This section explains how voice quality is measured and the different scales that the QoE Monitoring Server uses.

The basis of all measures of voice quality is subjective testing: how a person perceives the quality of speech is affected by human perception, and so it is inherently subjective. There are several different methodologies for subjective testing. Most voice quality measures are based on an absolute categorization rating (ACR) scale.

In an ACR subjective test, a statistically significant number of people rate their quality of experience on a scale of 1 (bad) to 5 (excellent). The average of the scores is called a mean opinion score (MOS). The resulting MOS depends on the range of experiences that were exposed to the group and to the type of experience being rated. As a result, MOS values between tests cannot be compared unless the conditions are the same.

Because it is impractical to conduct subjective tests of voice quality for a live communication system, the UC solution generates ACR MOS values by using advanced algorithms to objectively predict the results of a subjective test. The UC solution provides two classes of MOS values, listening quality MOS (MOS-LQ) and conversational quality MOS (MOS-CQ).

MOS-LQ is the most commonly used MOS value within the VoIP (Voice over IP) industry. It measures the quality of audio for listening purposes only. MOS-LQ does not take into account any of bidirectional effects, such as delay and echo. MOS-CQ takes into account listening quality in each direction, as well as the bidirectional effects.

The UC solution makes use of both narrowband (8 kHz sample rate) and wideband (16 kHz sample rate) audio codecs. In order to provide consistency in the measuring of the MOS-LQ, all of the MOS-LQ values are reported on wideband MOS-LQ scale instead of the traditional narrowband MOS-LQ scale that other systems provide.

The difference between the wideband MOS-LQ scale and narrowband MOS-LQ is the range of the experience played to the group of people who were in the subjective test. In the case of narrowband MOS-LQ, the group is exposed to speech where only narrowband codecs are used, and so the listeners lose any audio frequency content above 4 kHz. For wideband MOS-LQ, the group is exposed to speech where both narrowband and wideband codecs are used. Since listeners prefer the additional audio frequency content that can be represented in wideband audio, narrowband codecs will have a lower score on a wideband MOS-LQ score than on a narrowband MOS-LQ scale. For example G.711 is typically sited as having a narrowband MOS-LQ score of ~4.1 but when compared to wideband codecs on a wideband MOS-LQ scale, G.711 may have a score of only approximately 3.5.

Metrics Descriptions

The UC solution provides several different MOS values:

  • Listening MOS

  • Sending MOS

  • Network MOS

  • Conversational MOS

Listening MOS

Listening MOS is a prediction of the wideband Listening Quality (MOS-LQ)) of the audio stream that is played to the user. This value takes into consideration the audio fidelity and distortion and speech and noise levels, and from this data predicts how a large group of users would rate the quality of the audio they hear.

The Listening MOS varies depending on:

  • The codec used

  • A wideband or narrowband codec

  • The characteristics of the audio capture device used by the person speaking (person sending the audio).

  • Any transcoding or mixing that occurred

  • Defects from packet loss or packet loss concealment

  • The speech level and background noise of the person speaking (person sending the audio)

Due to the large number of factors that influence this value, it is most useful to view the Listening MOS statistically rather than by using a single call.

Sending MOS

Sending MOS is a prediction of the wideband Listening Quality Mean Opinion Score (MOS-LQ) of the audio stream that is being sent from the user. This value takes into consideration the speech and noise levels of the user along with any distortions, and from this data predicts how a large group of users would rate the audio quality they hear.

The Sending MOS varies depending on the:

  • Users audio capture device characteristics.

  • Speech level and background noise of the users device.

Due to the large number of factors that influence this value, it is most useful to view the Sending MOS statistically rather than by using a single value.

Network MOS

Network MOS is a prediction of the wideband Listening Quality Mean Opinion Score (MOS-LQ) of audio that is played to the user. This value takes into consideration only network factors such as codec used, packet loss, packet reorder, packet errors and jitter.

The difference between Network MOS and Listening MOS is that the Network MOS considers only the impact of the network on the listening quality, whereas Listening MOS also considers the payload (speech level, noise level, etc). This makes Network MOS useful for identifying network conditions impacting the audio quality being delivered.

For each codec, there is a maximum possible Network MOS that represents the best possible Listening Quality MOS under perfect network conditions. The following table shows the codec typically used in a scenario and the corresponding maximum Network MOS.

Table 1. Codecs Used in Scenarios with Maximum Network MOS

Scenario Codec Max NMOS

PC-PC call

RTAudio WB

4.10

Conference call

Siren

3.72

PC-PSTN call

RTAudio NB*

2.95

PC-PSTN call

Siren*

3.72

* The codec used in PC-PSTN can either be Siren or RTAudio NB depending on the configuration of the Mediation Server.

Because the maximum Network MOS varies depending on the scenario (because different codecs are used), it is usually more interesting to look at the average degradation of the Network MOS during the call. The average degradation can be broken down into how much is due to network jitter and how much is due to packet loss. For very small degradations, the cause of the degradation may not be available.

Conversational MOS

Conversational MOS is a prediction of the narrowband Conversational Quality (MOS-CQ) of the audio stream that is played to the user. This value takes into consideration the listening quality of the audio played and sent across the network, the speech and noise levels for both audio streams, and echoes. It represents how a large group of people would rate the quality of the connection for holding a conversation.

The Conversational MOS varies depending the same factors as Listening MOS, as well as the following:

  • Echo

  • Network delay

  • Delay due to jitter buffering

  • Delay due to devices

Due to the large number of factors that influence this value, it is most useful to view the Conversational MOS statistically rather than by using a single value.

Interpreting the MOS Metrics

The rich set of MOS and associated metrics provide a rich view into the quality of the experience being delivered to the end users and can be used to identify a wide range of issues. The basic approach to using the MOS metrics to identify quality affecting issues is to compare the current MOS metrics either against previously known good states or against similar conditions. Combined with filtering on different locations, time periods, call types, etc, the root cause can be narrowed down to lead to further investigation using the detailed metrics or other troubleshooting tools.

The following are a few examples of issues and how they can be identified through analysis of the metrics.

LAN Congestion

As a LAN becomes more congested with traffic, the rates of packet loss and amount of jitter will increase for calls that pass through the LAN. This increase in packet loss and jitter will be reflected in lower Network MOSs and higher average degradation for these calls. Using the QoE Trend reports, the lower Network MOSs can be seen for the past several weeks and can be used to identify the LAN that is exhibiting signs of congestion. The call list report for calls on that LAN will show higher degradation, jitter and packet loss when compared to calls made before the LAN was congested or when compared to calls made on similar un-congested networks.

Bad Devices or Device Drivers

The audio quality for a call is affected by the microphone device and associated driver used to capture the audio from the person speaking. If a new device is used or a new driver for the device is deployed that results in lower audio quality capture, this is reflected in lower Sending MOS. Using the device report, the Sending MOS for these devices can be compared to other devices and against historical data to isolate a problematic device or device driver which can then be addressed. It is important to note that to identify problematic devices or drivers, they must be deployed and used enough to generate sufficient data for analysis. A single rarely used problematic device will not be identifiable using this report.

Additional Metrics

Table 2 lists some additional metrics that are collected by the QoE Monitoring Server.

Table 2. Metrics and Descriptions

Metric Description

Degradation average

Average fraction of network MOS degradation for the codec that was used for the entire call. The greater the degradation value, the greater the network has degraded the audio experience. Degradation average is used in the call list and trend report.

Jitter

Variation in the delay time of packets arriving at their destination. VoIP packets are sent at regular intervals from the sender to the receiver, but because of network latency the interval between packets can vary at the destination. This variation can affect media quality. Jitter is used to determine MOSs as well as in the call detail report.

Degradation jitter average

Average fraction of network MOS degradation that is caused by jitter on the network during the call. Degradation jitter average is the amount of jitter that contributed to Degradation Average. By examining this field, you can determine if jitter was the major contributor to Network MOS degradation.

Packet loss

Ratio of VoIP packets lost to the total number of VoIP packets that were sent. Packet loss is used for MOSs, call detail report, and performance counters.

Degradation packet loss average

Average fraction of network MOS degradation that is caused by packet loss on the network during the call. Degradation packet loss average is the amount of packet loss that contributed to Degradation Average. By examining this field, you can determine if packet loss was the major contributor to Network MOS degradation.

Delay

Average roundtrip time during the call for a packet to be sent over the network from the sender to the receiver and back. Delay is used in the worst performing endpoint calculation and displayed in call list reports.

Video bit rate

Average rate, in bits per second, of the encoding process for the video image. Video bit rate is used in performance counters as well as the call detail report.

Video frame loss

Average number of unique consecutive images, or video frames, lost during the call. Because video frames can span multiple packets, this value can be more useful than packet loss in evaluating video quality. Video frame loss is used in the call detail report.

Video frame rate

Rate, in frames per second, at which frames are produced in the call. Video frame rate is used in the call detail report.

Bandwidth estimation

Estimated available bandwidth in the call. Bandwidth estimation is used in the call detail report

Burst density

The fraction of RTP (Real-Time Transport Protocol) data packets within burst periods since the beginning of reception that were either lost or discarded. A burst period is a period in which a high proportion of packets are either lost or discarded due to late arrival. Burst density is used in the call detail report.

Burst length

The mean duration, expressed in milliseconds, of the burst periods that have occurred since the beginning of reception. Burst length is used in the call detail report.

Gap density

The fraction of RTP data packets in the gaps between bursts since the beginning of reception that were either lost or discarded. Gap density is used in the call detail report.

Gap length

The mean duration, expressed in milliseconds, of the gap periods that have occurred since the beginning of reception. Gap length is used in the call detail report.