A Multichannel Sinusoidal Model Applied to Spot Microphone Signals for Immersive Audio

In this paper, a multichannel version of the sinusoids plus noise model (also known as deterministic plus stochastic decomposition) is proposed and applied to spot microphone signals of a music recording. These are the recordings captured by the various microphones placed in a venue, before the mixing process produces the final multichannel audio mix. Coding these microphone signals makes them available to the decoder, allowing for interactive audio reproduction which is a necessary component in immersive audio applications. The proposed model uses a single reference audio signal in order to derive a noise signal per spot microphone. This noise signal can significantly enhance the sinusoidal representation of the corresponding spot signal. The reference can be one of the spot signals or a downmix, depending on the application. Thus, for a collection of multiple spot signals, only the reference is fully encoded (e.g., as an MP3 monophonic signal). For the remaining spot signals, their sinusoidal parameters and corresponding noise spectral envelopes are retained and coded, resulting in bitrates for this side information in the order of 15 kb/s for perceptual performance above the 4.0 grade on the mean opinion score (MOS) scale.

[1]  C.-C. Jay Kuo,et al.  High-fidelity multichannel audio coding with Karhunen-Loeve transform , 2003, IEEE Trans. Speech Audio Process..

[2]  Christof Faller,et al.  Binaural cue coding-Part II: Schemes and applications , 2003, IEEE Trans. Speech Audio Process..

[3]  Methods for the subjective assessment of small impairments in audio systems , 2015 .

[4]  Jürgen Herre,et al.  MPEG Spatial Audio Coding / MPEG Surround: Overview and Current Status , 2005 .

[5]  Marina Bosi,et al.  ISO/IEC MPEG-2 Advanced Audio Coding: Overview and Applications , 1997 .

[6]  Yannis Stylianou,et al.  Applying the harmonic plus noise model in concatenative speech synthesis , 2001, IEEE Trans. Speech Audio Process..

[7]  Lippold Haken,et al.  A New Algorithm for Bandwidth Association in Bandwidth-Enhanced Additive Sound Modeling , 2000, ICMC.

[8]  Xavier Serra,et al.  A sound analysis/synthesis system based on a deterministic plus stochastic decomposition , 1990 .

[9]  Michael M. Goodwin,et al.  A Frequency-domain Framework for Spatial Audio Coding Based on Universal Spatial Cues , 2006 .

[10]  Jürgen Herre,et al.  Intensity Stereo Coding , 1994 .

[11]  Emmanuel Gallo,et al.  Extracting and Re-Rendering Structured Auditory Scenes from Field Recordings , 2007 .

[12]  Mark J. T. Smith,et al.  Speech analysis/synthesis and modification using an analysis-by-synthesis/overlap-add sinusoidal model , 1997, IEEE Trans. Speech Audio Process..

[13]  Andreas Jakobsson,et al.  Linear AM decomposition for sinusoidal audio coding , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[14]  P. Depalle,et al.  Spectral Envelopes and Inverse FFT Synthesis , 1992 .

[15]  Jeroen Breebaart,et al.  Parametric Coding of Stereo Audio , 2005, EURASIP J. Adv. Signal Process..

[16]  Heiko Purnhagen,et al.  A Closer Look into MPEG-4 High Efficiency AAC , 2003 .

[17]  Richard Heusdens,et al.  High-Quality Consistent Analysis-Synthesis in Sinusoidal Coding , 1999 .

[18]  Pim Korten,et al.  High-Resolution Spherical Quantization of Sinusoidal Parameters , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Jesper Jensen,et al.  A perceptual subspace approach for modeling of speech and audio signals with damped sinusoids , 2004, IEEE Transactions on Speech and Audio Processing.

[20]  A. Spanias,et al.  Perceptual coding of digital audio , 2000, Proceedings of the IEEE.

[21]  Christof Faller,et al.  Binaural cue coding-Part I: psychoacoustic fundamentals and design principles , 2003, IEEE Trans. Speech Audio Process..

[22]  W. Bastiaan Kleijn,et al.  On frequency quantization in sinusoidal audio coding , 2005, IEEE Signal Processing Letters.

[23]  Mathieu Lagrange,et al.  Enhancing the Tracking of Partials for the Sinusoidal Modeling of Polyphonic Sounds , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[24]  Julius O. Smith,et al.  Spectral modeling synthesis: A sound analysis/synthesis based on a deterministic plus stochastic decomposition , 1990 .

[25]  Mark Davis The AC-3 Multichannel Coder , 1993 .

[26]  Athanasios Mouchtaris,et al.  Modeling Spot Microphone Signals using the Sinusoidal Plus Noise Approach , 2007, 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[27]  Bhaskar D. Rao,et al.  PDF optimized parametric vector quantization of speech line spectral frequencies , 2003, IEEE Trans. Speech Audio Process..

[28]  Kiyohiro Shikano,et al.  Source-oriented localization control of stereo audio signals based on blind source separation , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[29]  W. Bastiaan Kleijn,et al.  Exploiting time and frequency masking in consistent sinusoidal analysis-synthesis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[30]  W. Kleijn,et al.  Jointly optimal quantization of parameters in sinusoidal audio coding , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[31]  J. D. Johnston,et al.  Sum-difference stereo transform coding , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[32]  Julius O. Smith,et al.  Multiresolution sinusoidal modeling for wideband audio with modifications , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[33]  M.M. Goodwin,et al.  Multichannel Matching Pursuit and Applications to Spatial Audio Coding , 2006, 2006 Fortieth Asilomar Conference on Signals, Systems and Computers.

[34]  Athanasios Mouchtaris,et al.  Virtual Microphones for Multichannel Audio Resynthesis , 2003, EURASIP J. Adv. Signal Process..

[35]  Michael M. Goodwin Residual modeling in music analysis-synthesis , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[36]  Athanasios Mouchtaris,et al.  Multiresolution Source/Filter Model for Low Bitrate Coding of Spot Microphone Signals , 2008, EURASIP J. Audio Speech Music. Process..

[37]  Sascha Disch,et al.  New Concepts in Parametric Coding of Spatial Audio: From SAC to SAOC , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[38]  Michael M. Goodwin Multiscale overlap-add sinusoidal modeling using matching pursuit and refinements , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[39]  Heiko Purnhagen,et al.  HILN-the MPEG-4 parametric audio coding tools , 2000, 2000 IEEE International Symposium on Circuits and Systems. Emerging Technologies for the 21st Century. Proceedings (IEEE Cat No.00CH36353).

[40]  R. Vafin,et al.  Sinusoidal modeling using psychoacoustic-adaptive matching pursuits , 2002, IEEE Signal Processing Letters.

[41]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[42]  Alexander Kain,et al.  High-resolution voice transformation , 2001 .

[43]  Jesper Jensen,et al.  Perceptual linear predictive noise modelling for sinusoid-plus-noise audio coding , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.