A discriminative learning approach to probabilistic acoustic source localization

Sound source localization algorithms commonly include assessment of inter-sensor (generalized) correlation functions to obtain direction-of-arrival estimates. Here, we present a classification-based method for source localization that uses discriminative support vector machine-learning of correlation patterns that are indicative of source presence or absence. Subsequent probabilistic modeling generates a map of sound source presence probability in given directions. Being data-driven, the method during training adapts to characteristics of the sensor setup, such as convolution effects in non-free-field situations, and to target signal specific acoustic properties. Experimental evaluation was conducted with algorithm training in anechoic single-talker scenarios and test data from several reverberant multi-talker situations, together with diffuse and real-recorded background noise, respectively. Results demonstrate that the method successfully generalizes from training to test conditions. Improvement over the best of five investigated state-of-the-art angular spectrum-based reference methods was on average about 45% in terms of relative F-measure-related error reduction.

[1]  M. Omologo,et al.  Comparison Between Different Sound Source Localization Techniques Based on a Real Data Collection , 2008, 2008 Hands-Free Speech Communication and Microphone Arrays.

[2]  Ralph Weischedel,et al.  PERFORMANCE MEASURES FOR INFORMATION EXTRACTION , 2007 .

[3]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[4]  Maurizio Omologo,et al.  Acoustic event localization using a crosspower-spectrum phase based technique , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Bhaskar D. Rao,et al.  A Two Microphone-Based Approach for Source Localization of Multiple Speech Sources , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  DeLiang Wang,et al.  Binaural Localization of Multiple Sources in Reverberant and Noisy Environments , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .

[8]  Volker Hohmann,et al.  Database of Multichannel In-Ear and Behind-the-Ear Head-Related and Binaural Room Impulse Responses , 2009, EURASIP J. Adv. Signal Process..

[9]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[10]  M. S. Brandstein A pitch-based approach to time-delay estimation of reverberant speech , 1997, Proceedings of 1997 Workshop on Applications of Signal Processing to Audio and Acoustics.

[11]  R. O. Schmidt,et al.  Multiple emitter location and signal Parameter estimation , 1986 .

[12]  Sergios Theodoridis,et al.  A Novel Efficient Cluster-Based MLSE Equalizer for Satellite Communication Channels with-QAM Signaling , 2006, EURASIP J. Adv. Signal Process..

[13]  Michael S. Brandstein,et al.  A robust method for speech signal time-delay estimation in reverberant rooms , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Emmanuel Vincent,et al.  Multi-source TDOA estimation in reverberant audio using angular spectra and clustering , 2012, Signal Process..

[15]  Steven van de Par,et al.  A Probabilistic Model for Robust Localization Based on a Binaural Auditory Front-End , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  W. Dreschler,et al.  Artificial noise signals with speechlike spectral and temporal properties for hearing instrument assessment , 1999 .

[17]  Benesty,et al.  Adaptive eigenvalue decomposition algorithm for passive acoustic source localization , 2000, The Journal of the Acoustical Society of America.

[18]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[19]  Benedikt Loesch,et al.  Blind Source Separation Based on Time-Frequency Sparseness in the Presence of Spatial Aliasing , 2010, LVA/ICA.

[20]  Chaouki T. Abdallah,et al.  One-vs-One Multiclass Least Squares Support Vector Machines for Direction of Arrival Estimation , 2003 .

[21]  Francesco Nesta,et al.  Enhanced multidimensional spatial functions for unambiguous localization of multiple sparse acoustic sources , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  James L. Flanagan,et al.  Estimation of wavefront arrival delay for acoustical signals using the cross‐power spectrum phase technique , 1996 .

[23]  Radu Horaud,et al.  Variational EM for binaural sound-source separation and localization , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[24]  Jacob Benesty,et al.  Time Delay Estimation in Room Acoustic Environments: An Overview , 2006, EURASIP J. Adv. Signal Process..

[25]  Sharon Gannot,et al.  Relative transfer function modeling for supervised source localization , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.