Low Bitrate Coding of Spot Audio Signals for Interactive and Immersive Audio Applications

In the last few years, a revolution has occurred in the area of consumer audio. Similarly to the transition from analog to digital sound that took place during the 80s, we have been experiencing the transition from 2-channel stereophonic sound to multichannel sound (e.g., 5.1 systems). Future audiovisual systems will not make distinctions regarding whether the user will be watching a movie or listening to a music recording; they are envisioned to offer a realistic experience to the user who will be immersed into the content, implying that the user will be able to interact with the content according to his will. In this paper, an encoding procedure is proposed, focusing on spot microphone signals, which is necessary for providing interactivity between the user and the environment. A model is proposed which achieves high-quality audio reproduction with side information for each spot microphone signal in the order of 19 kbps.

[1]  Christof Faller,et al.  Binaural cue coding-Part I: psychoacoustic fundamentals and design principles , 2003, IEEE Trans. Speech Audio Process..

[2]  Mark Davis The AC-3 Multichannel Coder , 1993 .

[3]  Athanasios Mouchtaris,et al.  Modeling Spot Microphone Signals using the Sinusoidal Plus Noise Approach , 2007, 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[4]  Jürgen Herre,et al.  Intensity Stereo Coding , 1994 .

[5]  Julius O. Smith,et al.  Spectral modeling synthesis: A sound analysis/synthesis based on a deterministic plus stochastic decomposition , 1990 .

[6]  C.-C. Jay Kuo,et al.  High-fidelity multichannel audio coding with Karhunen-Loeve transform , 2003, IEEE Trans. Speech Audio Process..

[7]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[8]  Karlheinz Brandenburg,et al.  MP3 and AAC Explained , 1999 .

[9]  Yannis Stylianou,et al.  Applying the harmonic plus noise model in concatenative speech synthesis , 2001, IEEE Trans. Speech Audio Process..

[10]  Jürgen Herre,et al.  MPEG Spatial Audio Coding / MPEG Surround: Overview and Current Status , 2005 .

[11]  Bhaskar D. Rao,et al.  PDF optimized parametric vector quantization of speech line spectral frequencies , 2003, IEEE Trans. Speech Audio Process..

[12]  Methods for the subjective assessment of small impairments in audio systems , 2015 .

[13]  Jeroen Breebaart,et al.  Parametric Coding of Stereo Audio , 2005, EURASIP J. Adv. Signal Process..

[14]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[15]  J. D. Johnston,et al.  Sum-difference stereo transform coding , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[16]  Michael M. Goodwin Residual modeling in music analysis-synthesis , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[17]  K. Karadimou,et al.  Multichannel Audio Modeling and Coding Using a Multiband Source/Filter Model , 2005, Conference Record of the Thirty-Ninth Asilomar Conference onSignals, Systems and Computers, 2005..

[18]  Jesper Jensen,et al.  Perceptual linear predictive noise modelling for sinusoid-plus-noise audio coding , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[19]  Louis Dunn Fielder,et al.  ISO/IEC MPEG-2 Advanced Audio Coding , 1997 .

[20]  Xavier Serra,et al.  A sound analysis/synthesis system based on a deterministic plus stochastic decomposition , 1990 .

[21]  W. Bastiaan Kleijn,et al.  On frequency quantization in sinusoidal audio coding , 2005, IEEE Signal Processing Letters.