Binaural detection of speech sources in complex acoustic scenes

In this paper we present a novel system that is able to simultaneously localize and detect a predefined number of speech sources in complex acoustic scenes based on binaural signals. The system operates in two steps: First, the acoustic scene is analyzed by a binaural front-end that detects relevant sound source activity. Second, a speech detection module selects source positions from a set of candidate positions that are most likely speech. The proposed method is evaluated in simulated multi-source scenarios consisting of two speech sources, three interfering noise sources and reverberation.

[1]  DeLiang Wang,et al.  A computational auditory scene analysis system for speech segregation and robust speech recognition , 2010, Comput. Speech Lang..

[2]  Norbert Dillier,et al.  A fast and accurate “shoebox” room acoustics simulator , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Martin Cooke,et al.  A glimpsing model of speech perception in noise. , 2006, The Journal of the Acoustical Society of America.

[4]  Les Atlas,et al.  Cross-channel correlation for the enhancement of noisy speech , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Bill Gardner,et al.  HRTF Measurements of a KEMAR Dummy-Head Microphone , 1994 .

[6]  E. C. Cherry Some Experiments on the Recognition of Speech, with One and with Two Ears , 1953 .

[7]  DeLiang Wang,et al.  Sequential Organization of Speech in Reverberant Environments by Integrating Monaural Grouping and Binaural Localization , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Guy J. Brown,et al.  Techniques for handling convolutional distortion with 'missing data' automatic speech recognition , 2004, Speech Commun..

[9]  Phil D. Green,et al.  Robust automatic speech recognition with missing and unreliable acoustic data , 2001, Speech Commun..

[10]  B C Wheeler,et al.  Localization of multiple sound sources with two microphones. , 2000, The Journal of the Acoustical Society of America.

[11]  Steven van de Par,et al.  A Probabilistic Model for Robust Localization Based on a Binaural Auditory Front-End , 2011, IEEE Transactions on Audio, Speech, and Language Processing.