JND-based spatial parameter quantization of multichannel audio signals

In multichannel spatial audio coding (SAC), the accurate representations of virtual sounds and the efficient compressions of spatial parameters are the key to perfect reproduction of spatial sound effects in 3D space. Just noticeable difference (JND) characteristics of human auditory system can be used to efficiently remove spatial perceptual redundancy in the quantization of spatial parameters. However, the quantization step sizes of spatial parameters in current SAC methods are not well correlated with the JND characteristics. It results in either spatial perceptual distortion or inefficient compression. A JND-based spatial parameter quantization (JSPQ) method is proposed in this paper. The quantization step sizes of spatial parameters are assigned according to JND values of azimuths in a full circle. The quantization codebook size of JSPQ was 56.7 % lower than one of the quantization codebooks of MPEG surround. Average bit rate reduction on spatial parameters for standard 5.1-channel signals reached up to approximately 13 % compared with MPEG surround, while preserving comparable subjective spatial quality.

[1]  A. Mills On the minimum audible angle , 1958 .

[2]  J. Blauert Spatial Hearing: The Psychophysics of Human Sound Localization , 1983 .

[3]  Ruimin Hu,et al.  Multichannel Simplification Based on Deviation of Loudspeaker Positions , 2015, PCM.

[4]  R. Rabenstein,et al.  The Theory of Wave Field Synthesis Revisited , 2008 .

[5]  Jeroen Breebaart,et al.  Parametric Coding of Stereo Audio , 2005, EURASIP J. Adv. Signal Process..

[6]  Robert Baumgartner,et al.  Modeling Localization of Amplitude-Panned Virtual Sources in Sagittal Planes. , 2015, Journal of the Audio Engineering Society. Audio Engineering Society.

[7]  D R Perrott,et al.  Role of signal onset in sound localization. , 1969, The Journal of the Acoustical Society of America.

[8]  Juliane Jung Introduction To Digital Audio Coding And Standards , 2016 .

[9]  L. Rayleigh,et al.  The theory of sound , 1894 .

[10]  Robert Baumgartner,et al.  Acoustic and non-acoustic factors in modeling listener-specific performance of sagittal-plane sound localization , 2014, Front. Psychol..

[11]  Oliver Hellmuth,et al.  Spatial Audio Object Coding (SAOC) - The Upcoming MPEG Standard on Parametric Object Based Audio Coding , 2008 .

[12]  Hu Ruimin,et al.  The Perceptual Characteristics of 3D Orientation , 2014, MMM 2014.

[13]  Christof Faller,et al.  Spatial Audio Processing: MPEG Surround and Other Applications , 2007 .

[14]  Piotr Majdak,et al.  3-D localization of virtual sound sources: Effects of visual environment, pointing method, and training , 2010, Attention, perception & psychophysics.

[15]  Yuhong Yang,et al.  Physical Properties of Sound Field Based Estimation of Phantom Source in 3D , 2015, PCM.

[16]  Bin Cheng,et al.  A General Compression Approach to Multi-Channel Three-Dimensional Audio , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Thushara D. Abhayapala,et al.  Reproduction of a plane-wave sound field using an array of loudspeakers , 2001, IEEE Trans. Speech Audio Process..

[18]  Bin Cheng,et al.  Psychoacoustic-based quantisation of spatial audio cues , 2008 .

[19]  Albert Y. Zomaya,et al.  Parallel Simulation of Complex Evacuation Scenarios with Adaptive Agent Models , 2015, IEEE Transactions on Parallel and Distributed Systems.

[20]  D. Grantham,et al.  Auditory spatial resolution in horizontal, vertical, and diagonal planes. , 2003, The Journal of the Acoustical Society of America.

[21]  John William Strutt The Theory of Sound: NOTE , 2011 .

[22]  Ruimin Hu,et al.  Automatic Multichannel Simplification with Low Impacts on Sound Pressure at Ears , 2014, PCM.

[23]  Gianluca Sergi Knocking at the door of cinematic artifice: Dolby Atmos, challenges and opportunities , 2013 .

[24]  Ahmet M. Kondoz,et al.  Multichannel Audio Coding Based on Analysis by Synthesis , 2011, Proceedings of the IEEE.

[25]  Jerome Daniel,et al.  Further Investigations of High-Order Ambisonics and Wavefield Synthesis for Holophonic Sound Imaging , 2003 .

[26]  Francis Rumsey,et al.  Spatial quality evaluation for reproduced sound: terminology, meaning and a scene-based paradigm , 2002 .

[27]  Ruimin Hu,et al.  Analysis and Comparison of Inter-Channel Level Difference and Interaural Level Difference , 2016, MMM.

[28]  Jürgen Herre,et al.  Spatial Audio Coding: Next-generation efficient and compatible coding of multi-channel audio , 2005 .

[29]  V. Pulkki,et al.  Multichannel audio rendering using amplitude panning [DSP Applications] , 2008, IEEE Signal Processing Magazine.

[30]  Lizhe Wang,et al.  Global Synchronization Measurement of Multivariate Neural Signals with Massively Parallel Nonlinear Interdependence Analysis , 2014, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[31]  Christof Faller,et al.  Binaural cue coding-Part II: Schemes and applications , 2003, IEEE Trans. Speech Audio Process..

[32]  Yuhong Yang,et al.  Azimuthal Perceptual Resolution Model Based Adaptive 3D Spatial Parameter Coding , 2015, MMM.

[33]  A. Ando,et al.  Conversion of Multichannel Sound Signal Maintaining Physical Properties of Sound in Reproduced Sound Field , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[34]  Christof Faller,et al.  Binaural cue coding-Part I: psychoacoustic fundamentals and design principles , 2003, IEEE Trans. Speech Audio Process..

[35]  Lizhe Wang,et al.  Fast and Scalable Multi-Way Analysis of Massive Neural Data , 2015, IEEE Transactions on Computers.

[36]  N. Durlach,et al.  Interaural time and amplitude jnds for a 500-Hz tone. , 1969, The Journal of the Acoustical Society of America.

[37]  D R Perrott,et al.  Minimum audible angle thresholds for sources varying in both elevation and azimuth. , 1990, The Journal of the Acoustical Society of America.

[38]  Nienke Meulman,et al.  An ERP study on L2 syntax processing: When do learners fail? , 2014, Front. Psychol..