BUFFERING FOR AUDIO / VIDEO TRANSMISSION OVER THE INTERNET

Transmitting real-time audio/video over the Internet is very difficult due to packet loss and jitter. These parameters vary depending on the locations of the senders and receivers, with typical packet loss rates of 0−20% and one-way delays of 5-500 ms. Delay variations occur within and across audio and video streams, complicating the synchronization process. One possibility for reducing jitter involves buffering audio and video packets at the receiver, so that slower packets arrive in time to be played out in the correct sequence at the appropriate times. This paper presents various adaptive playout buffer algorithms that minimize the effect of delay jitter. We evaluate their effectiveness through experiments based on a real network and compare their performance in terms of delay/packet loss ratios. Although the main focus of this paper is the playout buffering for audio, the synchronization between audio and video streams is also specified. 1. Fixed and adaptive jitter buffering Removing jitter involves collecting packets and holding them in the jitter buffer. This allows slower packets to arrive in time to be played out at the appropriate times. Generally the larger the jitter buffer is, the bigger the added delay and the more packets that are successfully played out. Unfortunately this additional delay lowers the perceived QoS. On the other hand, if the playout delay is set too low, the network-induced delay will cause some packets to arrive too late for playout and thus be lost, which also lowers the perceived QoS. The main objective of jitter buffering is to keep the packet loss rate under 5% and to keep the end-to-end delay as small as possible. The playout buffer delay may be kept fixed, or adaptively adjusted during the transmission. Although a fixed method, which uses a fixed buffer size, is easier to implement than an adaptive method, it can result in unsatisfactory audio or video quality. This is because there is no optimal delay when network conditions vary with time. The fluctuating end-to-end delays experienced over the Internet may cause latency to increase to a level where it is annoying to users (when the buffer is too large), or may cause packet losses due to their late arrivals (when the buffer is too small). Adaptive techniques perform continuous estimation of the network delays and dynamically adjust the playout delay at the beginning of each talkspurt. The playout adjustment is performed during the silent periods between talkspurts. The adjustment is done on the first packet of the talkspurt; all packets in the same talkspurt are scheduled to play out at fixed intervals following the playout of the first packet. This mechanism uses the same playout delay throughout a given talkspurt but permits different playout delays for different talkspurts. The variation of the playout delay may introduce artificially elongated or reduced silence periods, but such modification of silence periods is considered acceptable in the perceived speech if that variation is reasonably limited. 2. Adaptive playout algorithms An effective way to choose the buffering delay is to adapt it to the delay characteristic of the network. Since the current delay characteristic is not known apriori, adaptive algorithms calculate the playout time of each incoming talkspurt based on the delays experienced by already-received packets. In this Section we describe seven different algorithms. We ran those algorithms on the same set of data so we were able to compare the performance of the algorithms under identical network conditions. We collected experimental data at the receiving host in Dublin, Ireland while transmitting audio and video packets from the terminal in Poland. We used G.723.1 encoding scheme for audio and H.261 for video. 1 Department of Computer Science, University College Dublin, Belfield, Dublin 4, Ireland, www.cs.ucd.ie {miroslaw.narbutt, liam.murphy}@ucd.ie Algorithm 1 The first four algorithms were proposed by Ramjee et al in 1994 [1]. Let n be the total delay of audio packet i introduced by the network. Estimation of both the average network delay d and the average delay variation v is calculated for each incoming packet: ∧ d i = A* ∧ d (i-1) + (1-A)*n; ∧ v = A* ∧ v (i-1) + (1-A)* | ∧ d i – n|; These estimations are recomputed each time a packet arrives, but only used when a new talkspurt is initiated. In the detection of a new talkspurt, this algorithm uses their most recent values to calculate the playout delay of the first packet in the talkspurt: p = ∧ d i + B* ∧ v i Any subsequent packets of that talkspurt are played out with rate equal to the generation rate at the sender. Constant A is a fixed weighting factor that characterizes the memory properties of this estimation. To limit sensitivity to short-term packet jitter, A is usually chosen to be 0.99802. B is a variation coefficient that controls delay/packet loss ratio. B is usually chosen to be 4. The larger this coefficient, the more packets are played out at the expense of longer delays. The figures below show the calculated playout times (darker line) and the network delays (dots) of received packets. Packets whose delays are above the darker line are lost. All the others are successfully played out. Fig1. Playout times calculated by Algorithm 1 Algorithm 2 The second algorithm is similar to the first one but adapts more quickly to short burst of packets incurring long delays. The idea is to use two values of the weighting factor A, a smaller one (A_BIS ) for increasing trends in the delay and a bigger one (A) for decreasing trends. Fig2. Playout times calculated by Algorithm 2 Algorithm 3 The third algorithm attempts to be more aggressive in minimizing delays. Instead of using a running estimate of network delays, it uses the minimum network delay of all packets received in the previous talkspurt as the average delay. Fig3. Playout times calculated by Algorithm 3 Algorithm 3b This algorithm, proposed by us, is a modification of Algorithm 3. It uses the maximum delay of all packets received in the previous talkspurt as the estimate of the average delay. This modification minimizes the packet loss factor. Fig4. Playout times calculated by Algorithm 3b Algorithm 4 This algorithm detects spikes – steep raises in network delays, followed by a monotonic decrease back to the normal level. It has two modes of operation, depending on whether a spike has been detected. Fig5. Playout times calculated by Algorithm 4 If a packet arrives with a delay that is larger than given threshold (e.g. some multiple of the current playout delay), the algorithm switches to spike mode. The two modes differ in how the estimate of network delay is updated. In normal mode, a running estimate of the average delay and its variance is performed (as in Algorithm 1). During a spike, the delay estimate tracks the delays more closely. Algorithm 5 In 1995 Moon et al [2] proposed an algorithm that collects network delays of already received packets in order to estimate the playout delay. The delays of the last K packets are recorded and the distribution of delays is updated with each incoming packet. The frequency of each delay is maintained in a histogram. When a new packet arrives, the delay of the oldest packet is removed from the histogram, and the delay of the newest is added. The delay distribution is computed using a cumulative sum of the frequencies, and is done only in the beginning of a new talkspurt. The algorithm calculates a given percentile point of the delay in the distribution function and uses it as a playout delay for the new talkspurt. Algorithm 5 also detects spikes. Once a spike is detected, it stops collecting packet delays. If a new talkspurt begins during a spike, it uses the delay of the first packet of a talkspurt as the playout delay for that talkspurt. Fig6. Playout times calculated by Algorithm 5 The number of recorded packet’s delays determines how sensitive the algorithm is to changes. If it is too small, the algorithm is likely to produce a poor estimate of the playout delay. If it is too long, the algorithm will keep track of an unnecessarily large amount of past history. Algorithm 6 This algorithm, proposed in 1999 by Pinto and Christensen [3], is supposed to target any desired loss rate. It adapts the buffering delay based on arrival and playout times of packets received in the previous talkspurt only. The playout delay is taken straight from the ordered list of delays of the previous talkspurt. It should be the minimum amount of delay that is required to play out the previous talkspurt at exactly the desired packet loss. Like the two previous algorithms, algorithm 6 operates in two modes. In spike mode it uses the delay of the first packet of a talkspurt as the playout delay for that talkspurt. Fig7. Playout times calculated by Algorithm 6. 3. Audio/video synchronization process Audio and video packets are sent across the Internet using the best-effort UDP transport protocol, supported by the application layer RTP protocol. Each RTP header contains a timestamp, a sequence number, a marker bit, and a source id to identify the different streams. All these numbers are useful during the synchronization process. For example, the sequence number is necessary to detect packet losses, the timestamp is needed for inter-stream and intra-stream synchronization, and the marker bit indicates the beginning of a talkspurt. The idea behind the audio-video synchronization process is that the adaptive playout algorithms are performed first, and the video frames are played out dependent on the playout times of their corresponding audio packets. This is done by storing the video frames in a video playout buffer and by delaying each frame until the corresponding audio packets are played out. The correspondence between audio and video frames is given by their timestamps. If the video quality is not acceptable using this scheme which gives priority to audio, additional

[1]  Henning Schulzrinne,et al.  Adaptive playout mechanisms for packetized audio applications in wide-area networks , 1994, Proceedings of INFOCOM '94 Conference on Computer Communications.

[2]  Kenneth J. Christensen,et al.  An algorithm for playout of packet voice based on adaptive adjustment of talkspurt silence periods , 1999, Proceedings 24th Conference on Local Computer Networks. LCN'99.

[3]  Donald F. Towsley,et al.  Packet audio playout delay adjustment: performance bounds and algorithms , 1998, Multimedia Systems.