Managing Jitters for Voice Packets
This feature is related to the SIP INFO Method for DTMF Tone Generation feature, which adds support for
out-of-band DTMF tone generation using the SIP INFO method. Together the two features provide a mechanism
to both send and receive DTMF digits along the signaling path. The SPA-DSP supports the detection and
reporting of inband DTMF tones and their conversion to RFC2833 based DTMF tones or out-of-band signalling
(and vice versa).
The SPA-DSP supports the conversion from RFC2833 DTMF tones to inband tones.
Note
Managing Jitters for Voice Packets
This section explains how jitter in voice packets are managed by SPA-DSP to provide smooth flow of voice
streams. Jitter is defined as the variation in the delay of received packets. In a packet-voice environment, the
sender is expected to reliably transmit voice packets at a regular interval (for example, send one frame every
20 ms). These voice packets can be delayed throughout the packet network and not arrive at that same regular
interval at the receiving station (for example, they might not be received every 20 ms). The difference between
when the packet is expected and when it is actually received is defined as a jitter.
To handle the jitters, the SPA-DSP maintains a jitter buffer to store a certain amount of voice frames in the
buffer and wait for the voice frames that arrive late. The jitter buffer size is determined by counting the number
of packets that arrive late and creating a ratio of packets that late arriving to the number of packets that are
successfully processed. This ratio can be used to determine the jitter buffer size that is used to calculate a
predetermined and allowable later-packet ratio. After the jitter buffer is full with the specific voice packets,
it plays all the RTP audio stream for VoIP in a steady stream to the SPA-DSP to be able to convert them into
a steady audio stream.
Comfort Noise and Voice Activity Detection (VAD)
This section discusses about how to deal with voice packets and silence periods during a voice call. It also
provides information about how these voice quality issues can be rectified by using the voice activity detection
(VAD) feature. The IP-based telephony systems need a voice activity detector to detect silence periods in the
voice signal and temporarily discontinue transmission of the signal during the silence period. This saves
bandwidth and allows the far-end to adjust its jitter-buffer. The downside is that during silence periods, the
far-end phone has to generate its own signal to play to the listener. Usually, comfort noise is played out to the
listener to mask the absence of an audio signal from the far-end. Comfort noise is usually modeled on the
far-end noise so that there is not a stark contrast when you switch from the actual background noise to the
comfort noise.
There are two possibilities to which comfort noise is injected in a voice call. The foremost is the use of VAD.
Whenever VAD kicks-in, comfort noise packets are introduced in the audio stream. The second possibility
(not a major contributor) is the kicking-in of echo-cancellation. Whenever echo-cancellation becomes active,
comfort noise packets are introduced in the audio stream. The characteristics of these comfort packets is
determined through an algorithm which includes monitoring on-going speech and receiving a signature of the
background noise.
The SPA-DSP provides VAD and comfort noise functionalities by default. You can enable the local VAD
settings by using the vad on override command from the config-dspfarm-profile mode. Using the vad on
override command will override the external VAD settings. To disable the local VAD settings, use the vad
off override command from the config-dspfarm-profile command mode.
Cisco ASR 1000 Series Aggregation Services Routers SIP and SPA Software Configuration Guide, Cisco IOS
XE Everest 16.5
402
Overview of the Cisco DSP SPA for the ASR 1000 Series Aggregation Services Routers
OL-14127-17