Interpolation of combined head and room impulse response for audio spatialization

Audio spatialization is becoming an important part of creating realistic experiences needed for immersive video conferencing and gaming. Using a combined head and room impulse response (CHRIR) has been recently proposed as an alternative to using separate head related transfer functions (HRTF) and room impulse responses (RIR). Accurate measurements of the CHRIR at various source and listener locations and orientations are needed to perform good quality audio spatialization. However, it is infeasible to accurately measure or model the CHRIR for all possible locations and orientations. Therefore, low-complexity and accurate interpolation techniques are needed to perform audio spatialization in real-time. In this paper, we present a frequency domain interpolation technique which naturally interpolates the interaural level difference (ILD) and interaural time difference (ITD) for each frequency component in the spectrum. The proposed technique allows for an accurate and low-complexity interpolation of the CHRIR as well as allowing for a low-complexity audio spatialization technique which can be used for both headphones as well as loudspeakers.

[1]  Manfred R. Schroeder,et al.  Natural Sounding Artificial Reverberation , 1962 .

[2]  Athanasios Mouchtaris,et al.  Inverse Filter Design for Immersive Audio Rendering Over Loudspeakers , 2000, IEEE Trans. Multim..

[3]  J. Robert Stuart,et al.  The Psychoacoustics of Multichannel Audio , 1996 .

[4]  Ryan M. Kilgore Listening to Unfamiliar Voices in Spatial Audio : Does Visualization of Spatial Position Enhance Voice Identification ? , 2006 .

[5]  Luiz W. P. Biscainho,et al.  Interpolation of Head-Related Transfer Functions (HRTFS): A multi-source approach , 2004, 2004 12th European Signal Processing Conference.

[6]  Jessica J. Baldis Effects of spatial audio on memory, comprehension, and preference during desktop conferences , 2001, CHI.

[7]  Zhengyou Zhang,et al.  Highly realistic audio spatialization for multiparty conferencing using headphones , 2009, 2009 IEEE International Workshop on Multimedia Signal Processing.

[8]  Zhengyou Zhang,et al.  Realistic audio in immersive video conferencing , 2011, 2011 IEEE International Conference on Multimedia and Expo.

[9]  Athanasios Mouchtaris,et al.  Non-minimum phase inverse filter methods for immersive audio rendering , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[10]  Myung-Suk Song,et al.  Personal 3D audio system with loudspeakers , 2010, 2010 IEEE International Conference on Multimedia and Expo.

[11]  Jean-Marc Jot,et al.  Binaural Simulation of Complex Acoustic Scenes for Interactive Audio , 2006 .

[12]  T. Ajdler,et al.  The Plenacoustic Function and Its Sampling , 2006, IEEE Transactions on Signal Processing.

[13]  Klaus Hartung,et al.  Comparison of Different Methods for the Interpolation of Head-Related Transfer Functions , 1999 .

[14]  C. Avendano,et al.  The CIPIC HRTF database , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[15]  Victor Lazzarini,et al.  Frequency-Domain Interpolation of Empirical HRTF Data , 2009 .

[16]  Frank Boland,et al.  Acoustic Impulse Response Interpolation for Multichannel Systems Using Dynamic Time Warping , 2009 .

[17]  A. W. M. van den Enden,et al.  Discrete Time Signal Processing , 1989 .

[18]  Manfred R. Schroeder,et al.  -Colorless- Artificial Reverberation , 1960 .