An unsupervised Bayesian classifier for multiple speaker detection and localization

Multiple speaker localization algorithms generally require a binary detector, which performs the source/noise classification of the location estimates. This is mainly due to the unknown timevarying number of sources, and to the presence of noise and reverberation. In this paper, we propose an unsupervised learning approach based on a naive Bayesian classifier. The proposed approach couples two speaker location features, namely, 1) the steered response power introduced at the location estimate, and 2) the corresponding maximum likelihood error, which characterizes the variance of the estimate. The latter is experimentally shown to be highly correlated with the steered power at the location estimate. The proposed method is further extended to control the misclassification rate through the use of a loss function. This approach is general, and can be easily extended to integrate more speaker/speech features. Experiments on the AV16.3 corpus show the effectiveness of the proposed approach.

[1]  Jean-Marc Odobez,et al.  AV16.3: An Audio-Visual Corpus for Speaker Localization and Tracking , 2004, MLMI.

[2]  Julius O. Smith,et al.  Closed-form least-squares source location estimation from range-difference measurements , 1987, IEEE Trans. Acoust. Speech Signal Process..

[3]  Friedrich Faubel,et al.  A Multiple Hypothesis Gaussian Mixture Filter for Acoustic Source Localization and Tracking , 2012, IWAENC.

[4]  Benesty,et al.  Adaptive eigenvalue decomposition algorithm for passive acoustic source localization , 2000, The Journal of the Acoustical Society of America.

[5]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .

[6]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[7]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[8]  Jacob Benesty,et al.  Robust time delay estimation exploiting redundancy among multiple microphones , 2003, IEEE Trans. Speech Audio Process..

[9]  Jacob Benesty,et al.  The generalization of narrowband localization methods to broadband environments via parametrization of the spatial correlation matrix , 2007, 2007 15th European Signal Processing Conference.

[10]  Harry L. Van Trees,et al.  Optimum Array Processing , 2002 .

[11]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[12]  Jacob Benesty,et al.  Direction of Arrival Estimation Using the Parameterized Spatial Correlation Matrix , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Harvey F. Silverman,et al.  SRP-PHAT methods of locating simultaneous multiple talkers using a frame of microphone array data , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Harvey F. Silverman,et al.  A method for locating multiple sources from a frame of a large-aperture microphone array data without tracking , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[16]  Nilesh Madhu A SCALABLE FRAMEWORK FOR MULTIPLE SPEAKER LOCALIZATION AND TRACKING , 2008 .

[17]  Jacob Benesty,et al.  Fast steered response power source localization using inverse mapping of relative delays , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[18]  Michael S. Brandstein,et al.  A closed-form location estimator for use with room environment microphone arrays , 1997, IEEE Trans. Speech Audio Process..

[19]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[20]  Friedrich Faubel,et al.  A probabilistic framework for multiple speaker localization , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[21]  Friedrich Faubel,et al.  Joint detection and localization of multiple speakers using a probabilistic interpretation of the steered response power , 2012, SAPA@INTERSPEECH.

[22]  Hervé Bourlard,et al.  Threshold Selection for Unsupervised Detection, With an Application to Microphone Arrays , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[23]  Joseph H. DiBiase A High-Accuracy, Low-Latency Technique for Talker Localization in Reverberant Environments Using Microphone Arrays , 2000 .

[24]  Emanuel A. P. Habets,et al.  Multiple-Hypothesis Extended Particle Filter for Acoustic Source Localization in Reverberant Environments , 2011, IEEE Transactions on Audio, Speech, and Language Processing.