Multi-source TDOA estimation in reverberant audio using angular spectra and clustering

We consider the problem of estimating the time differences of arrival (TDOAs) of multiple sources from a two-channel reverberant audio signal. While several clustering-based or angular spectrum-based methods have been proposed in the literature, only relatively small-scale experimental evaluations restricted to either category of methods have been carried out so far. We design and conduct the first large-scale experimental evaluation of these methods and investigate a two-step procedure combining angular spectra and clustering. In addition, we introduce and evaluate five new TDOA estimation methods inspired from signal-to-noise-ratio (SNR) weighting and probabilistic multi-source modeling techniques that have been successful for anechoic TDOA estimation and audio source separation. For 5cm microphone spacing, the best TDOA estimation performance is achieved by one of the proposed SNR-based angular spectrum methods. For larger spacing, a variant of the generalized cross-correlation with phase transform (GCC-PHAT) method performs best.

[1]  Te-Won Lee,et al.  Blind Speech Separation , 2007, Blind Speech Separation.

[2]  Emmanuel Vincent,et al.  The 2008 Signal Separation Evaluation Campaign: A Community-Based Approach to Large-Scale Evaluation , 2009, ICA.

[3]  R. O. Schmidt,et al.  Multiple emitter location and signal Parameter estimation , 1986 .

[4]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[5]  Rémi Gribonval,et al.  A Robust Method to Count and Locate Audio Sources in a Multichannel Underdetermined Mixture , 2010, IEEE Transactions on Signal Processing.

[6]  Harald Viste,et al.  On the Use of Spatial Cues to Improve Binaural Source Separation , 2003 .

[7]  Alexey Ozerov,et al.  Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Rémi Gribonval,et al.  Under-Determined Reverberant Audio Source Separation Using a Full-Rank Spatial Covariance Model , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  M. Omologo,et al.  Comparison Between Different Sound Source Localization Techniques Based on a Real Data Collection , 2008, 2008 Hands-Free Speech Communication and Microphone Arrays.

[10]  Sylvain Marchand,et al.  A Source Localization/Separation/Respatialization System Based on Unsupervised Classification of Interaural Cues , 2006 .

[11]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[12]  Rémi Gribonval,et al.  Underdetermined Instantaneous Audio Source Separation via Local Gaussian Modeling , 2009, ICA.

[13]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .

[14]  Yannick Deville,et al.  Temporal and time-frequency correlation-based blind source separation methods. Part I: Determined and underdetermined linear instantaneous mixtures , 2007, Signal Process..

[15]  C. Faller,et al.  Source localization in complex listening situations: selection of binaural cues based on interaural coherence. , 2004, The Journal of the Acoustical Society of America.

[16]  Hiroshi Sawada,et al.  Grouping Separated Frequency Components by Estimating Propagation Model Parameters in Frequency-Domain Blind Source Separation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  B C Wheeler,et al.  Localization of multiple sound sources with two microphones. , 2000, The Journal of the Acoustical Society of America.

[18]  Benedikt Loesch,et al.  Comparison of Different Algorithms for Acoustic Source Localization , 2010, Sprachkommunikation.

[19]  Dinh-Tuan Pham,et al.  A phase-based dual microphone method to count and locate audio sources in reverberant rooms , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[20]  M. Viberg,et al.  Two decades of array signal processing research: the parametric approach , 1996, IEEE Signal Process. Mag..

[21]  Benedikt Loesch,et al.  Adaptive Segmentation and Separation of Determined Convolutive Mixtures under Dynamic Conditions , 2010, LVA/ICA.

[22]  Scott Rickard,et al.  Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.

[23]  Guy J. Brown,et al.  Speech segregation based on sound localization , 2003 .

[24]  Emmanuel Vincent,et al.  Multi-source TDOA estimation using SNR-based angular spectra , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  DeLiang Wang,et al.  Speech segregation based on sound localization , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[26]  Daniel P. W. Ellis,et al.  An EM Algorithm for Localizing Multiple Sound Sources in Reverberant Environments , 2006, NIPS.

[27]  Francesco Nesta,et al.  Cumulative State Coherence Transform for a Robust Two-Channel Multiple Source Localization , 2009, ICA.

[28]  Pau Bofill Identifying Single Source Data for Mixing Matrix Estimation in Instantaneous Blind Source Separation , 2008, ICANN.

[29]  Shigeki Sagayama,et al.  Sparseness-Based 2CH BSS using the EM Algorithm in Reverberant Environment , 2007, 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[30]  Michael S. Brandstein,et al.  Microphone Arrays - Signal Processing Techniques and Applications , 2001, Microphone Arrays.

[31]  Rémi Gribonval,et al.  Blind Spectral-GMM Estimation for Underdetermined Instantaneous Audio Source Separation , 2009, ICA.

[32]  Hiroshi Sawada,et al.  Stereo Source Separation and Source Counting with MAP Estimation with Dirichlet Prior Considering Spatial Aliasing Problem , 2009, ICA.