Auditory model based direction estimation of concurrent speakers from binaural signals

Humans show a very robust ability to localize sounds in adverse conditions. Computational models of binaural sound localization and technical approaches of direction-of-arrival (DOA) estimation also show good performance, however, both their binaural feature extraction and the strategies for further analysis partly differ from what is currently known about the human auditory system. This study investigates auditory model based DOA estimation emphasizing known features and limitations of the auditory binaural processing such as (i) high temporal resolution, (ii) restricted frequency range to exploit temporal fine-structure, (iii) use of temporal envelope disparities, and (iv) a limited range to compensate for interaural time delay. DOA estimation performance was investigated for up to five concurrent speakers in free field and for up to three speakers in the presence of noise. The DOA errors in these conditions were always smaller than 5^o. A condition with moving speakers was also tested and up to three moving speakers could be tracked simultaneously. Analysis of DOA performance as a function of the binaural temporal resolution showed that short time constants of about 5ms employed by the auditory model were crucial for robustness against concurrent sources.

[1]  Guy J. Brown,et al.  Computational auditory scene analysis: Exploiting principles of perceived continuity , 1993, Speech Commun..

[2]  Birger Kollmeier,et al.  Hearing - from sensory processing to perception , 2007 .

[3]  Hideki Kawahara,et al.  YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[4]  Matthew J Goupell,et al.  Interaural fluctuations and the detection of interaural incoherence: bandwidth effects. , 2006, The Journal of the Acoustical Society of America.

[5]  W. Lindemann Extension of a binaural cross-correlation model by contralateral inhibition. I. Simulation of lateralization for stationary signals. , 1986, The Journal of the Acoustical Society of America.

[6]  Irwin Pollack,et al.  Binaural Listening and Interaural Noise Cross Correlation , 1959 .

[7]  G. F. Kuhn Model for the interaural time differences in the azimuthal plane , 1977 .

[8]  Volker Hohmann,et al.  Sound source localization in real sound fields based on empirical statistics of interaural parameters. , 2006, The Journal of the Acoustical Society of America.

[9]  Peter Heil,et al.  Coding of temporal onset envelope in the auditory system , 2003, Speech Commun..

[10]  W. T. Peake,et al.  Sound-pressure measurements in the cochlear vestibule of human-cadaver ears. , 1997, The Journal of the Acoustical Society of America.

[11]  T. Dau,et al.  Characterizing frequency selectivity for envelope fluctuations. , 2000, The Journal of the Acoustical Society of America.

[12]  Giso Grimm,et al.  Increase and Subjective Evaluation of Feedback Stability in Hearing Aids by a Binaural Coherence-Based Noise Reduction Scheme , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Guy J. Brown,et al.  A binaural processor for missing data speech recognition in the presence of noise and small-room reverberation , 2004, Speech Commun..

[14]  C Trahiotis,et al.  Detection of interaural delay in high-frequency sinusoidally amplitude-modulated tones, two-tone complexes, and bands of noise. , 1994, The Journal of the Acoustical Society of America.

[15]  B Kollmeier,et al.  Directivity of binaural noise reduction in spatial multiple noise-source arrangements for normal and impaired listeners. , 1997, The Journal of the Acoustical Society of America.

[16]  J Blauert,et al.  Auditory spaciousness: some further psychoacoustic analyses. , 1986, The Journal of the Acoustical Society of America.

[17]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .

[18]  P. B. A. S. T. D.Sc. LI. On the function of the two ears in the perception of space , 1882 .

[19]  B. McA. Sayers,et al.  Acoustic‐Image Lateralization Judgments with Binaural Transients , 1964 .

[20]  A. Bronkhorst,et al.  Multichannel speech intelligibility and talker recognition using monaural, binaural, and three-dimensional auditory presentation. , 2000, The Journal of the Acoustical Society of America.

[21]  DeLiang Wang,et al.  On the optimality of ideal binary time-frequency masks , 2009, Speech Commun..

[22]  S van de Par,et al.  Dependence of binaural masking level differences on center frequency, masker bandwidth, and interaural parameters. , 1999, The Journal of the Acoustical Society of America.

[23]  D. McAlpine,et al.  A neural code for low-frequency sound localization in mammals , 2001, Nature Neuroscience.

[24]  E. C. Cherry Some Experiments on the Recognition of Speech, with One and with Two Ears , 1953 .

[25]  B. Moore,et al.  Suggested formulae for calculating auditory-filter bandwidths and excitation patterns. , 1983, The Journal of the Acoustical Society of America.

[26]  M. Ruggero,et al.  Furosemide alters organ of corti mechanics: evidence for feedback of outer hair cells upon the basilar membrane , 1991, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[27]  Jont B. Allen,et al.  Multimicrophone signal‐processing technique to remove room reverberation from speech signals , 1977 .

[28]  DeLiang Wang,et al.  Binary and ratio time-frequency masks for robust speech recognition , 2006, Speech Commun..

[29]  A. Palmer,et al.  Phase-locking in the cochlear nerve of the guinea-pig and its relation to the receptor potential of inner hair-cells , 1986, Hearing Research.

[30]  Simon Haykin,et al.  The Cocktail Party Problem , 2005, Neural Computation.

[31]  Volker Hohmann,et al.  Database of Multichannel In-Ear and Behind-the-Ear Head-Related and Binaural Room Impulse Responses , 2009, EURASIP J. Adv. Signal Process..

[32]  Guy J. Brown,et al.  Speech segregation based on sound localization , 2003 .

[33]  DeLiang Wang,et al.  Binaural tracking of multiple moving sources , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[34]  Guy J. Brown,et al.  A multi-pitch tracking algorithm for noisy speech , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[35]  S van de Par,et al.  Binaural processing model based on contralateral inhibition. III. Dependence on temporal parameters. , 2001, The Journal of the Acoustical Society of America.

[36]  Jouko Lampinen,et al.  Rao-Blackwellized particle filter for multiple target tracking , 2007, Inf. Fusion.

[37]  Volker Hohmann,et al.  Lateralization of stimuli with independent fine-structure and envelope-based temporal disparities. , 2008, The Journal of the Acoustical Society of America.

[38]  Guy J. Brown,et al.  Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2006 .

[39]  Torsten Marquardt,et al.  A π-limit for coding ITDs: Implications for binaural models , 2007 .

[40]  Steven van de Par,et al.  A Probabilistic Model for Robust Localization Based on a Binaural Auditory Front-End , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[41]  B C Wheeler,et al.  Localization of multiple sound sources with two microphones. , 2000, The Journal of the Acoustical Society of America.

[42]  Volker Hohmann,et al.  Strategy-selective noise reduction for binaural digital hearing aids , 2003, Speech Commun..

[43]  Ben Supper,et al.  An auditory onset detection algorithm for improved automatic source localization , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[44]  C. Faller,et al.  Source localization in complex listening situations: selection of binaural cues based on interaural coherence. , 2004, The Journal of the Acoustical Society of America.

[45]  Nathaniel I. Durlach,et al.  Performance in several binaural‐interaction experiments , 1985 .

[46]  Volker Hohmann,et al.  Combined Estimation of Spectral Envelopes and Sound Source Direction of Concurrent Voices by Multidimensional Statistical Filtering , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[47]  Jonas Braasch,et al.  Localization in the presence of a distracter and reverberation in the frontal horizontal plane. II. Model algorithms , 2002 .

[48]  Leslie R Bernstein,et al.  Enhancing sensitivity to interaural delays at high frequencies by using "transposed stimuli". , 2002, The Journal of the Acoustical Society of America.

[49]  Hideki Kawahara,et al.  Multiple period estimation and pitch perception model , 1999, Speech Commun..

[50]  H S Colburn,et al.  Binaural sluggishness in the perception of tone sequences and speech in noise. , 2000, The Journal of the Acoustical Society of America.

[51]  B Kollmeier,et al.  Binaural forward and backward masking: evidence for sluggishness in binaural detection. , 1990, The Journal of the Acoustical Society of America.

[52]  Iain McCowan,et al.  Clustering and segmenting speakers and their locations in meetings , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[53]  Alberto Recio-Spinoso,et al.  Auditory Midbrain and Nerve Responses to Sinusoidal Variations in Interaural Correlation , 2006, The Journal of Neuroscience.

[54]  B. Grothe,et al.  Psychophysical and Physiological Evidence for Fast Binaural Processing , 2008, The Journal of Neuroscience.

[55]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[56]  Richard M. Stern,et al.  Spatial separation of speech signals using amplitude estimation based on interaural comparisons of zero-crossings , 2009, Speech Commun..

[57]  B. Grothe,et al.  Precise inhibition is essential for microsecond interaural time difference coding , 2002, Nature.

[58]  Volker Hohmann,et al.  Objective perceptual quality assessment for self-steering binaural hearing aid microphone arrays , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[59]  Philip X Joris,et al.  Decorrelation Sensitivity of Auditory Nerve and Anteroventral Cochlear Nucleus Fibers to Broadband and Narrowband Noise , 2006, The Journal of Neuroscience.

[60]  Volker Hohmann,et al.  Coding of temporally fluctuating interaural timing disparities in a binaural processing model based on phase differences , 2008, Brain Research.