Packet loss protection for interactive audio object rendering: A multiple description approach

This paper presents a new framework for compression and transmission of simultaneously occurring audio objects through packet loss channels in order to maintain user Quality of Experience (QoE). The audio objects are compressed into two mono mixtures, by exploiting sparsity of multichannel audio signals to identify the two most dominant time-frequency components in terms of an energy measure. These mixtures are further compressed using the MP3 audio codec, with the optimised transmission model selected from several channel coding models based on the Forward Error Correction (FEC) and Multiple Description Coding (MDC) packet loss protection techniques. Audio objects can be recovered robustly from any received description(s) allowing real-time selective reproduction at the listeners' end. Results from spectral distortion measurements indicate the proposed scheme maintains the perceptual quality of the audio objects across a wide variety of packet loss conditions.

[1]  Raymond N. J. Veldhuis,et al.  On the computation of the Kullback-Leibler measure for spectral distances , 2003, IEEE Trans. Speech Audio Process..

[2]  Christian Ritz,et al.  Hybrid FEC and MDC models for low-delay packet-loss recovery , 2011, 2011 5th International Conference on Signal Processing and Communication Systems (ICSPCS).

[3]  Vivek K. Goyal,et al.  Multiple description coding: compression meets the network , 2001, IEEE Signal Process. Mag..

[4]  Eitan Altman,et al.  Queueing analysis of simple FEC schemes for IP telephony , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[5]  Ramón Cáceres,et al.  RTP Control Protocol Extended Reports (RTCP XR) , 2003, RFC.

[6]  Gunnar Karlsson,et al.  Wide Area Measurements of Voice over IP Quality , 2003, QofIS.

[7]  A. J. McAuley Reliable broadband communication using a burst erasure correcting code , 1990, SIGCOMM 1990.

[8]  Bin Cheng,et al.  Principles and Analysis of the Squeezing Approach to Low Bit Rate Spatial Audio Coding , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[9]  Shenghui Zhao,et al.  Analytical and Experimental Comparison of Packet Loss Recovery Methods Based on AMR-WB for VoIP , 2009, 2009 IEEE International Conference on Communications.

[10]  Peter Jax,et al.  A postfilter for echo and noise reduction avoiding the problem of musical tones , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[11]  Oliver Hellmuth,et al.  Spatial Audio Object Coding (SAOC) - The Upcoming MPEG Standard on Parametric Object Based Audio Coding , 2008 .

[12]  Özgür Yilmaz,et al.  Blind separation of disjoint orthogonal signals: demixing N sources from 2 mixtures , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[13]  Jiangtao Xi,et al.  Encoding Navigable Speech Sources: A Psychoacoustic-Based Analysis-by-Synthesis Approach , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Laurent Girin,et al.  Informed Source Separation of Linear Instantaneous Under-Determined Audio Mixtures by Source Index Embedding , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  J.B. Millar,et al.  The Australian National Database of Spoken Language , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[16]  Gunnar Karlsson,et al.  A Rate-Distortion Based Comparison of Media-Dependent FEC and MDC for Real-Time Audio , 2006, 2006 IEEE International Conference on Communications.

[17]  Scott Rickard,et al.  Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.

[18]  Y. Ebihara Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies , 2000, Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064).

[19]  Jürgen Herre,et al.  MPEG Surround – the ISO/MPEG Standard for Efficient and Compatible Multi-Channel Audio Coding , 2007 .

[20]  W. Bastiaan Kleijn,et al.  Comparative rate-distortion performance of multiple description coding for real-time audiovisual communication over the Internet , 2006, IEEE Transactions on Communications.

[21]  Peter Schelkens,et al.  Error protection of scalable soures: A comparative analysis of Forward Error Correction and Multiple Description Coding , 2009, 2009 16th International Conference on Digital Signal Processing.

[22]  K. Karadimou,et al.  Packet Loss Concealment for Multichannel Audio Using the Multiband Source/Filter Model , 2006, 2006 Fortieth Asilomar Conference on Signals, Systems and Computers.

[23]  N. Jayant Subsampling of a DPCM speech channel to provide two “self-contained” half-rate channels , 1981, The Bell System Technical Journal.

[24]  Andrew Sekey,et al.  An Objective Measure for Predicting Subjective Quality of Speech Coders , 1992, IEEE J. Sel. Areas Commun..