A Robust Method to Count and Locate Audio Sources in a Multichannel Underdetermined Mixture

We propose a method to count and estimate the mixing directions in an underdetermined multichannel mixture. The approach is based on the hypothesis that in the neighborhood of some time-frequency points, only one source essentially contributes to the mixture: such time-frequency points can provide robust local estimates of the corresponding source direction. At the core of our contribution is a statistical model to exploit a local confidence measure, which detects the time-frequency regions where such robust information is available. A clustering algorithm called DEMIX is proposed to merge the information from all time-frequency regions according to their confidence level. So as to estimate the delays of anechoic mixtures and overcome the intrinsic ambiguities of phase unwrapping as met with DUET, we propose a technique similar to GCC-PHAT that is able to estimate delays that can largely exceed one sample. We propose an extensive experimental study that shows the resulting method is more robust in conditions where all DUET-like comparable methods fail, that is, in particular, a) when time-delays largely exceed one sample and b) when the source directions are very close.

[1]  T. W. Anderson An Introduction to Multivariate Statistical Analysis, 2nd Edition. , 1985 .

[2]  Rémi Gribonval,et al.  A Robust Method to Count and Locate Audio Sources in a Stereophonic Linear Anechoic Mixture , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[3]  Jonathon A. Chambers,et al.  Active source selection using gap statistics for underdetermined blind source separation , 2003, Seventh International Symposium on Signal Processing and Its Applications, 2003. Proceedings..

[4]  Rémi Gribonval,et al.  Blind Spectral-GMM Estimation for Underdetermined Instantaneous Audio Source Separation , 2009, ICA.

[5]  Emmanuel Vincent,et al.  The 2008 Signal Separation Evaluation Campaign: A Community-Based Approach to Large-Scale Evaluation , 2009, ICA.

[6]  Boualem Boashash,et al.  Separating More Sources Than Sensors Using Time-Frequency Distributions , 2005, EURASIP J. Adv. Signal Process..

[7]  Scott Rickard,et al.  Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.

[8]  Y. Deville,et al.  Time–frequency ratio-based blind separation methods for attenuated and time-delayed sources , 2005 .

[9]  Pau Bofill,et al.  Underdetermined blind separation of delayed sound sources in the frequency domain , 2003, Neurocomputing.

[10]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .

[11]  Moeness G. Amin,et al.  Blind source separation based on time-frequency signal representations , 1998, IEEE Trans. Signal Process..

[12]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[13]  Yannick Deville,et al.  Blind separation of dependent sources using the "time-frequency ratio of mixtures" approach , 2003, Seventh International Symposium on Signal Processing and Its Applications, 2003. Proceedings..

[14]  J. Schmee An Introduction to Multivariate Statistical Analysis , 1986 .

[15]  Cédric Févotte,et al.  Two contributions to blind source separation using time-frequency distributions , 2004, IEEE Signal Processing Letters.

[16]  Yannick Deville,et al.  A Time-Frequency CORRelation-Based Blind Source Separation Method for Time-Delayed Mixtures , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[17]  Michael Zibulevsky,et al.  Underdetermined blind source separation using sparse representations , 2001, Signal Process..

[18]  D. R. Campbell,et al.  A MATLAB Simulation of “ Shoebox ” Room Acoustics for use in Research and Teaching , 2022 .

[19]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[20]  Rémi Gribonval,et al.  Oracle estimators for the benchmarking of source separation algorithms , 2007, Signal Process..

[21]  Emmanuel Vincent,et al.  Complex Nonconvex l p Norm Minimization for Underdetermined Source Separation , 2007, ICA.

[22]  Giuseppe Patanè,et al.  The enhanced LBG algorithm , 2001, Neural Networks.

[23]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[24]  Rémi Gribonval,et al.  Underdetermined Instantaneous Audio Source Separation via Local Gaussian Modeling , 2009, ICA.

[25]  E. Hoyer,et al.  The zoom FFT using complex modulation , 1977 .

[26]  Rémi Gribonval,et al.  A Robust Method to Count and Locate Audio Sources in a Stereophonic Linear Instantaneous Mixture , 2006, ICA.

[27]  T. W. Anderson ASYMPTOTIC THEORY FOR PRINCIPAL COMPONENT ANALYSIS , 1963 .

[28]  Alexey Ozerov,et al.  Multichannel nonnegative matrix factorization in convolutive mixtures. With application to blind audio source separation , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.