Geometry-Based Spatial Sound Acquisition Using Distributed Microphone Arrays

Traditional spatial sound acquisition aims at capturing a sound field with multiple microphones such that at the reproduction side a listener can perceive the sound image as it was at the recording location. Standard techniques for spatial sound acquisition usually use spaced omnidirectional microphones or coincident directional microphones. Alternatively, microphone arrays and spatial filters can be used to capture the sound field. From a geometric point of view, the perspective of the sound field is fixed when using such techniques. In this paper, a geometry-based spatial sound acquisition technique is proposed to compute virtual microphone signals that manifest a different perspective of the sound field. The proposed technique uses a parametric sound field model that is formulated in the time-frequency domain. It is assumed that each time-frequency instant of a microphone signal can be decomposed into one direct and one diffuse sound component. It is further assumed that the direct component is the response of a single isotropic point-like source (IPLS) of which the position is estimated for each time-frequency instant using distributed microphone arrays. Given the sound components and the position of the IPLS, it is possible to synthesize a signal that corresponds to a virtual microphone at an arbitrary position and with an arbitrary pick-up pattern.

[1]  Ville Pulkki,et al.  Spatial Sound Reproduction with Directional Audio Coding , 2007 .

[2]  Emanuel A. P. Habets,et al.  Signal-to-reverberant ratio estimation based on the complex spatial coherence between omnidirectional microphones , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Scott Rickard,et al.  Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.

[4]  Christof Faller,et al.  Perceptually Motivated Gain Filter Smoothing for Noise Suppression , 2007 .

[5]  Miriam A. Doron,et al.  Wavefield modeling and array processing .I. Spatial sampling , 1994, IEEE Trans. Signal Process..

[6]  Richard M. Schwartz,et al.  Enhancement of speech corrupted by acoustic noise , 1979, ICASSP.

[7]  Giovanni Del Galdo,et al.  A Spatial Filtering Approach for Directional Audio Coding , 2009 .

[8]  Thomas Kailath,et al.  ESPRIT-estimation of signal parameters via rotational invariance techniques , 1989, IEEE Trans. Acoust. Speech Signal Process..

[9]  Juha Merimaa,et al.  Spatial Impulse Response Rendering II: Reproduction of Diffuse Sound and Listening Tests , 2006 .

[10]  Giovanni Del Galdo,et al.  Dereverberation in the Spatial Audio Coding Domain , 2011 .

[11]  Jukka Ahonen,et al.  Diffuseness estimation using temporal variation of intensity vectors , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[12]  Giovanni Del Galdo,et al.  Generating virtual microphone signals using geometrical information gathered by distributed arrays , 2011, 2011 Joint Workshop on Hands-free Speech Communication and Microphone Arrays.

[13]  Jesper Jensen,et al.  DFT-Domain Based Single-Microphone Noise Reduction for Speech Enhancement , 2013, DFT-Domain Based Single-Microphone Noise Reduction for Speech Enhancement.

[14]  Juha Merimaa,et al.  Applications of a 3-D Microphone Array , 2002 .

[15]  Michael S. Brandstein,et al.  A robust method for speech signal time-delay estimation in reverberant rooms , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[16]  Richard Schultz-Amling,et al.  Acoustical Zooming Based on a Parametric Sound Field Representation , 2010 .

[17]  Svein Berge,et al.  HIGH ANGULAR RESOLUTION PLANEWAVE EXPANSION , 2010 .

[18]  Juha Merimaa,et al.  Spatial Impulse Response Rendering , 2004 .

[19]  C. Burrus,et al.  Array Signal Processing , 1989 .

[20]  W. Marsden I and J , 2012 .

[21]  Rainer Martin,et al.  Cepstral Smoothing of Spectral Filter Gains for Speech Enhancement Without Musical Noise , 2007, IEEE Signal Processing Letters.

[22]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[23]  Christof Faller,et al.  Linear Simulation of Spaced Microphone Arrays Using B-Format Recordings , 2010 .

[24]  Jürgen Herre,et al.  Interactive Teleconferencing Combining Spatial Audio Object Coding and DirAC Technology , 2010 .

[25]  H. Sabine Room Acoustics , 1953, The SAGE Encyclopedia of Human Communication Sciences and Disorders.

[26]  Sharon Gannot,et al.  Adaptive Beamforming and Postfiltering , 2008 .

[27]  Giovanni Del Galdo,et al.  On the spatial coherence in mixed sound fields and its application to signal-to-diffuse ratio estimation. , 2012, The Journal of the Acoustical Society of America.

[28]  Jonathan Kitchens,et al.  Acoustic vector-sensor array processing , 2010 .

[29]  Tiago H. Falk,et al.  A Non-Intrusive Quality and Intelligibility Measure of Reverberant and Dereverberated Speech , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[30]  Boaz Rafaely,et al.  Microphone Array Signal Processing , 2008 .

[31]  Özgür Yilmaz,et al.  On the approximate W-disjoint orthogonality of speech , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[32]  Maja Taseska,et al.  The diffuse sound field in energetic analysis. , 2012, The Journal of the Acoustical Society of America.

[33]  Tapio Lokki,et al.  Directional Audio Coding: Virtual Microphone-Based Synthesis and Subjective Evaluation , 2009 .

[34]  Gibak Kim,et al.  Gain-induced speech distortions and the absence of intelligibility benefit with existing noise-reduction algorithms. , 2011, The Journal of the Acoustical Society of America.

[35]  E. Williams,et al.  Fourier Acoustics: Sound Radiation and Nearfield Acoustical Holography , 1999 .

[36]  Rudolf Rabenstein,et al.  Limitations in the extrapolation of wave fields from circular measurements , 2007, 2007 15th European Signal Processing Conference.

[37]  Sven Nordholm,et al.  Spectral subtraction using reduced delay convolution and adaptive averaging , 2001, IEEE Trans. Speech Audio Process..

[38]  Christophe Beaugeant,et al.  Blind estimation of the coherent-to-diffuse energy ratio from noisy speech signals , 2011, 2011 19th European Signal Processing Conference.