A Sparsity-Based Approach to 3D Binaural Sound Synthesis Using Time-Frequency Array Processing

Localization of sounds in physical space plays a very important role in multiple audio-related disciplines, such as music, telecommunications, and audiovisual productions. Binaural recording is the most commonly used method to provide an immersive sound experience by means of headphone reproduction. However, it requires a very specific recording setup using high-fidelity microphones mounted in a dummy head. In this paper, we present a novel processing framework for binaural sound recording and reproduction that avoids the use of dummy heads, which is specially suitable for immersive teleconferencing applications. The method is based on a time-frequency analysis of the spatial properties of the sound picked up by a simple tetrahedral microphone array, assuming source sparseness. The experiments carried out using simulations and a real-time prototype confirm the validity of the proposed approach.

[1]  Scott Rickard,et al.  Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.

[2]  Maximo Cobos,et al.  Analysis of room reverberation effects in source localization using small microphone arrays , 2010, 2010 4th International Symposium on Communications, Control and Signal Processing (ISCCSP).

[3]  Ville Pulkki Directional Audio Coding in Spatial Sound Reproduction and Stereo Upmixing , 2006 .

[4]  Giovanni Del Galdo,et al.  Nested microphone array processing for parameter estimation in Directional Audio Coding , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[5]  Michael Zibulevsky,et al.  Underdetermined blind source separation using sparse representations , 2001, Signal Process..

[6]  G. Wakefield,et al.  TO HEAD-RELATED TRANSFER FUNCTIONS ( HRTF ’ S ) : REPRESENTATIONS OF HRTF ’ S IN TIME , FREQUENCY , AND SPACE ( invited tutorial ) , 1999 .

[7]  Ville Pulkki,et al.  Spatial sound generation and perception by amplitude panning techniques , 2001 .

[8]  Sungjin Park,et al.  A Binaural Synthesis with Multiple Sound Sources Based on Spatial Features of Head-related Transfer Functions , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[9]  Klaus Diepold,et al.  Efficient State-Space Interpolation of HRTFs , 2006 .

[10]  Juha Merimaa,et al.  Applications of a 3-D Microphone Array , 2002 .

[11]  V. Ralph Algazi,et al.  The Use of Head-and-Torso Models for Improved Spatial Sound Synthesis , 2002 .

[12]  Rudolf Susnik,et al.  Spatial sound resolution of an interpolated HRIR library , 2005 .

[13]  Jiří Machač Intel Integrated Performance Primitives a jejich využití při vývoji aplikací , 2008 .

[14]  Christof Faller,et al.  Spatial Audio Processing: MPEG Surround and Other Applications , 2007 .

[15]  Juan José Burred,et al.  From sparse models to timbre learning: new methods for musical source separation , 2009 .

[16]  Richard O. Duda,et al.  An efficient HRTF model for 3-D sound , 1997, Proceedings of 1997 Workshop on Applications of Signal Processing to Audio and Acoustics.

[17]  Tapio Lokki,et al.  Teleconference Application and B-Format Microphone Array for Directional Audio Coding , 2007 .

[18]  Özgür Yilmaz,et al.  On the approximate W-disjoint orthogonality of speech , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[19]  Swen Müller,et al.  Transfer-Function Measurement with Sweeps , 2001 .

[20]  Ag Armin Kohlrausch,et al.  Parametric binaural synthesis: Background, applications and standards , 2010 .

[21]  Maja Taseska,et al.  In Situ Microphone Array Calibration for Parameter Estimation in Directional Audio Coding , 2010 .

[22]  Ville Pulkki,et al.  Spatial Sound Reproduction with Directional Audio Coding , 2007 .

[23]  Shoko Araki,et al.  PERFORMANCE EVALUATION OF SPARSE SOURCE SEPARATION AND DOA ESTIMATION WITH OBSERVATION VECTOR CLUSTERING IN REVERBERANT ENVIRONMENTS , 2006 .

[24]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[25]  Christof Faller,et al.  Spatial Audio Processing , 2007 .

[26]  T. Sikora,et al.  On the Use of Auditory Representations for Sparsity-Based Sound Source Separation , 2005, 2005 5th International Conference on Information Communications & Signal Processing.

[27]  DUETScott Rickard,et al.  DOA ESTIMATION OF MANY W-DISJOINT ORTHOGONAL SOURCESFROM TWO MIXTURES USING , 2000 .

[28]  Mikko-Ville Laitinen,et al.  Binaural reproduction for Directional Audio Coding , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[29]  Thorsten Herfet,et al.  ON THE WINDOW-DISJOINT-ORTHOGONALITY OF SPEECH SOURCES IN REVERBERANT HUMANOID SCENARIOS , 2008 .

[30]  Maximo Cobos,et al.  Two-microphone separation of speech mixtures based on interclass variance maximization. , 2010, The Journal of the Acoustical Society of America.

[31]  Emmanuel Vincent,et al.  The 2008 Signal Separation Evaluation Campaign: A Community-Based Approach to Large-Scale Evaluation , 2009, ICA.

[32]  Klaus Diepold,et al.  A New HRTF Interpolation Approach for Fast Synthesis of Dynamic Environmental Interaction , 2008 .

[33]  Kazuya Takeda,et al.  Interpolating HRTF for auditory virtual reality , 1996 .

[34]  C. Avendano,et al.  The CIPIC HRTF database , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[35]  Paulo S. R. Diniz,et al.  Efficient HRTF Interpolation in 3D Moving Sound , 2002 .

[36]  Jeroen Breebaart,et al.  Binaural Rendering in MPEG Surround , 2008, EURASIP J. Adv. Signal Process..

[37]  Khaled H. Hamed,et al.  Time-frequency analysis , 2003 .

[38]  Panayiotis G. Georgiou,et al.  A multiple input single output model for rendering virtual sound sources in real time , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[39]  Dorte Hammershøi,et al.  Binaural Technique: Do We Need Individual Recordings? , 1996 .

[40]  Bill Gardner,et al.  HRTF Measurements of a KEMAR Dummy-Head Microphone , 1994 .

[41]  Gerald Enzner,et al.  3D-continuous-azimuth acquisition of head-related impulse responses using multi-channel adaptive filtering , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[42]  Jukka Ahonen,et al.  Planar Microphone Array Processing for the Analysis and Reproduction of Spatial Audio Using Directional Audio Coding , 2008 .

[43]  Özgür Yilmaz,et al.  Blind separation of disjoint orthogonal signals: demixing N sources from 2 mixtures , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).