Auditory scene analysis by time-delay analysis with three microphones

We propose two methods for the disambiguation of results in time-delay based detection and localization of sound sources, when a triangle of microphones is applied for signal acquisition. A standard approach is to create histograms of time differences of arrival (TDOA) for each microphone pair in a triangular array and to create an averaged histogram. But each individual histogram is designed to detect unique orientation of source only within the local range of [-π/2, π/2]. Hence, taking the average for different pairs is not appropriate and such method suffers from ambiguity of results in the full range of orientations: [0, 2π]. Our first proposition is a delay vector transformation method, that combines corresponding delay measurements into vectors and transforms them into a 2-D space in which a full-range orientation histogram can finally be established and analyzed. In our second method, individual orientation histograms obtained for pairs of microphones are analyzed first and for each detected source two competitive hypotheses are created. Due to a final clustering of the hypothesis set a unique orientation of each source can be estimated.

[1]  A. Piersol Time delay estimation using phase data , 1981 .

[2]  Jie Huang,et al.  A biomimetic system for localization and separation of multiple sound sources , 1994 .

[3]  Maurizio Omologo,et al.  Acoustic source location in a three-dimensional space using crosspower spectrum phase , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Scott Rickard,et al.  Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.

[5]  Yannick Deville,et al.  A time-frequency blind signal separation method applicable to underdetermined mixtures of dependent sources , 2005, Signal Process..

[6]  Hiroshi Sawada,et al.  Underdetermined blind sparse source separation for arbitrarily arranged multiple sensors , 2007, Signal Process..

[7]  Scott Rickard,et al.  The DUET Blind Source Separation Algorithm , 2007, Blind Speech Separation.

[8]  Rémi Gribonval,et al.  A robust method to count, locate and separate audio sources in a multichannel underdetermined mixture , 2008 .

[9]  D. Wang,et al.  Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2006, IEEE Trans. Neural Networks.

[10]  Jhing-Fa Wang,et al.  A Long-Distance Time Domain Sound Localization , 2008, UIC.

[11]  Zhengyou Zhang,et al.  Maximum Likelihood Sound Source Localization and Beamforming for Directional Microphone Arrays in Distributed Meetings , 2008, IEEE Transactions on Multimedia.

[12]  Nozomu Hamada,et al.  Multiple-speech-source localization using advanced histogram mapping method , 2009 .

[13]  Nozomu Hamada,et al.  Separation of speech mixture by time-frequency masking utilizing sound harmonics , 2009 .

[14]  Arun Ross,et al.  Microphone Arrays , 2009, Encyclopedia of Biometrics.

[15]  Yuanqing Li,et al.  K-hyperline clustering learning for sparse component analysis , 2009, Signal Process..

[16]  Rémi Gribonval,et al.  A Robust Method to Count and Locate Audio Sources in a Multichannel Underdetermined Mixture , 2010, IEEE Transactions on Signal Processing.

[17]  Ning Ding,et al.  Relaxing the WDO Assumption in Blind Extraction of Speakers from Speech Mixtures , 2010 .