A Psychoacoustic-Based Multiple Audio Object Coding Approach via Intra-Object Sparsity

Rendering spatial sound scenes via audio objects has become popular in recent years, since it can provide more flexibility for different auditory scenarios, such as 3D movies, spatial audio communication and virtual classrooms. To facilitate high-quality bitrate-efficient distribution for spatial audio objects, an encoding scheme based on intra-object sparsity (approximate k-sparsity of the audio object itself) is proposed in this paper. The statistical analysis is presented to validate the notion that the audio object has a stronger sparseness in the Modified Discrete Cosine Transform (MDCT) domain than in the Short Time Fourier Transform (STFT) domain. By exploiting intra-object sparsity in the MDCT domain, multiple simultaneously occurring audio objects are compressed into a mono downmix signal with side information. To ensure a balanced perception quality of audio objects, a Psychoacoustic-based time-frequency instants sorting algorithm and an energy equalized Number of Preserved Time-Frequency Bins (NPTF) allocation strategy are proposed, which are employed in the underlying compression framework. The downmix signal can be further encoded via Scalar Quantized Vector Huffman Coding (SQVH) technique at a desirable bitrate, and the side information is transmitted in a lossless manner. Both objective and subjective evaluations show that the proposed encoding scheme outperforms the Sparsity Analysis (SPA) approach and Spatial Audio Object Coding (SAOC) in cases where eight objects were jointly encoded.

[1]  Minjie Xie,et al.  From ITU-T G.722.1 to ITU-T G.722.1 Annex C: A New Low-Complexity 14kHz Bandwidth Audio Coding Standard , 2007, J. Multim..

[2]  M. Davidson Kamala Dhas,et al.  Analysis of audio signal using integer MDCT with Kaiser Bessel Derived window , 2017, 2017 4th International Conference on Advanced Computing and Communication Systems (ICACCS).

[3]  Oliver Hellmuth,et al.  Spatial Audio Object Coding (SAOC) - The Upcoming MPEG Standard on Parametric Object Based Audio Coding , 2008 .

[4]  E.J. Candes,et al.  An Introduction To Compressive Sampling , 2008, IEEE Signal Processing Magazine.

[5]  Jan Plogsties,et al.  MPEG-H 3D Audio—The New Standard for Coding of Immersive Spatial Audio , 2015, IEEE Journal of Selected Topics in Signal Processing.

[6]  Scott Rickard,et al.  Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.

[7]  Robert L. Bleidt,et al.  Object-Based Audio : Opportunities for Improved Listening Experience and Increased Listener Involvement , 2014 .

[8]  Jiangtao Xi,et al.  Encoding navigable speech sources: An analysis by synthesis approach , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  E. Candès,et al.  Stable signal recovery from incomplete and inaccurate measurements , 2005, math/0503066.

[10]  A. Spanias,et al.  Perceptual coding of digital audio , 2000, Proceedings of the IEEE.

[11]  Bin Cheng,et al.  A Spatial Squeezing approach to Ambisonic audio compression , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  Pulkki,et al.  Directional Audio Coding: Filterbank and STFT-Based Design , 2006 .

[13]  Xin Liu,et al.  An embedded speech and audio coding method based on bit-plane coding and SQVH , 2009, 2009 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT).

[14]  Miikka Vilermo,et al.  Modified Discrete Cosine Transform: Its Implications for Audio Coding and Error Concealment , 2003 .

[15]  Bin Cheng,et al.  A General Compression Approach to Multi-Channel Three-Dimensional Audio , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Jürgen Herre,et al.  MPEG Surround , 2005, IEEE MultiMedia.

[17]  Ville Pulkki,et al.  Virtual Sound Source Positioning Using Vector Base Amplitude Panning , 1997 .

[18]  Ville Pulkki Directional Audio Coding in Spatial Sound Reproduction and Stereo Upmixing , 2006 .

[19]  Jürgen Herre,et al.  MPEG Spatial Audio Coding / MPEG Surround: Overview and Current Status , 2005 .

[20]  Cai Yu,et al.  Voice activity detection based on short-time energy and noise spectrum adaptation , 2002, 6th International Conference on Signal Processing, 2002..

[21]  Jiangtao Xi,et al.  Encoding Navigable Speech Sources: A Psychoacoustic-Based Analysis-by-Synthesis Approach , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  Wenbei Wang,et al.  An analysis-by-synthesis encoding approach for multiple audio objects , 2015, 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).

[23]  Christian Ritz,et al.  Encoding Multiple Audio Objects Using Intra-Object Sparsity , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[24]  Jürgen Herre,et al.  MPEG Spatial Audio Object Coding—The ISO/MPEG Standard for Efficient Coding of Interactive Audio Scenes , 2010 .

[25]  Minjie Xie,et al.  ITU-T G.722.1 Annex C: A New Low-Complexity 14 KHZ Audio Coding Standard , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[26]  Louis Dunn Fielder,et al.  ISO/IEC MPEG-2 Advanced Audio Coding , 1997 .

[27]  Marina Bosi,et al.  Introduction to Digital Audio Coding and Standards , 2004, J. Electronic Imaging.

[28]  Bin Cheng,et al.  Principles and Analysis of the Squeezing Approach to Low Bit Rate Spatial Audio Coding , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[29]  Bill Gardner,et al.  HRTF Measurements of a KEMAR Dummy-Head Microphone , 1994 .