Zero-Crossing Based Binaural Mask Estimation for Missing Data Speech Recognition

This paper presents a new method of zero-crossing based binaural mask estimation for missing data speech recognition under the condition that multiple sound sources are present simultaneously. The masking is determined by the estimated directions of sound sources using the spatial cues such as inter-aural time differences (ITDs) and inter-aural intensity differences (IIDs). In the suggested method, the estimation of ITDs is utilizing the statistical properties of zero-crossings generated from binaural filter-bank outputs. We also consider the estimation of ITDs with the aid of IID samples to cope with the phase ambiguities of ITD samples in high frequencies. As a result, the proposed method is able to provide an accurate estimate of sound source directions and a good masking scheme for speech recognition while offering significantly less computational complexity compared to cross-correlation based methods

[1]  L A JEFFRESS,et al.  A place theory of sound localization. , 1948, Journal of comparative and physiological psychology.

[2]  W. G. Gardner,et al.  HRTF measurements of a KEMAR , 1995 .

[3]  Rhee Man Kil,et al.  Sound source localization based on zero-crosing peak-amplitude coding , 2004, INTERSPEECH.

[4]  Phil D. Green,et al.  Robust automatic speech recognition with missing and unreliable acoustic data , 2001, Speech Commun..

[5]  Oded Ghitza,et al.  Auditory models and human performance in tasks related to speech coding and speech recognition , 1994, IEEE Trans. Speech Audio Process..

[6]  Rhee Man Kil,et al.  Auditory processing of speech signals for robust speech recognition in real-world noisy environments , 1999, IEEE Trans. Speech Audio Process..

[7]  DeLiang Wang,et al.  Speech segregation based on sound localization , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[8]  Ed. McKenzie 10. Time Series Analysis by Higher Order Crossings , 1996 .