Robust speech recognition in reverberant environments using subband-based steady-state monaural and binaural suppression

The precedence effect describes the ability of the auditory system to suppress the later-arriving components of sound in a reverberant environment, maintaining the perceived arrival azimuth of a sound in the direction of the actual source, even though later reverberant components may arrive from other directions. It is also widely believed that precedence-like processing can also improve speech intelligibility, as well as the accuracy of speech recognition systems, in reverberant environments. While the mechanisms underlying the precedence effect have traditionally been assumed to be binaural in nature, it is also possible that the suppression of later components may take place monaurally, and that the suppression of the later-arriving components of the spatial image may be a consequence of this more peripheral processing. This paper compares the potential contributions of onset enhancement (and consequent steadystate suppression) of the envelopes of subband components of speech at both the monaural and binaural levels. Experimental results indicate that substantial improvement in recognition accuracy can be obtained in reverberant environments if the feature extraction includes both onset enhancement and binaural interaction. Recognition accuracy appears to be relatively unaffected by the stage in the suppression processing at which the binaural interaction takes place.

[1]  Richard M. Stern,et al.  Feature extraction for robust speech recognition using a power-law nonlinearity and power-bias subtraction , 2009, INTERSPEECH.

[2]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[3]  C Trahiotis,et al.  Peripheral auditory processing and investigations of the "precedence effect" which utilize successive transient stimuli. , 2001, The Journal of the Acoustical Society of America.

[4]  Richard M. Stern,et al.  Nonlinear enhancement of onset for robust speech recognition , 2010, INTERSPEECH.

[5]  W. Lindemann Extension of a binaural cross-correlation model by contralateral inhibition. I. Simulation of lateralization for stationary signals. , 1986, The Journal of the Acoustical Society of America.

[6]  Nathaniel I. Durlach,et al.  Chapter 11 – MODELS OF BINAURAL INTERACTION , 1978 .

[7]  W. Lindemann Extension of a binaural cross-correlation model by contralateral inhibition. II. The law of the first wave front. , 1986, The Journal of the Acoustical Society of America.

[8]  H S Colburn,et al.  The precedence effect. , 1999, The Journal of the Acoustical Society of America.

[9]  Richard M. Stern,et al.  Signal separation for robust speech recognition based on phase difference information obtained in the frequency domain , 2009, INTERSPEECH.

[10]  Richard M. Stern,et al.  Feature extraction for robust speech recognition based on maximizing the sharpness of the power distribution and on power flooring , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Richard M. Stern,et al.  Signal Processing for Robust Speech Recognition , 1994, HLT.

[12]  DeLiang Wang,et al.  Binaural Sound Localization , 2006 .

[13]  H. Gaskell The precedence effect , 1983, Hearing Research.

[14]  Richard M. Stern,et al.  Power function-based power distribution normalization algorithm for robust speech recognition , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[15]  Keith D. Martin Echo suppression in a computational model of the precedence effect , 1997, Proceedings of 1997 Workshop on Applications of Signal Processing to Audio and Acoustics.

[16]  B. Moore,et al.  A revision of Zwicker's loudness model , 1996 .

[17]  Ji-Won Cho,et al.  Imposition of Sparse Priors in Adaptive Time Delay Estimation for Speaker Localization in Reverberant Environments , 2009, IEEE Signal Processing Letters.

[18]  Patti Price,et al.  The DARPA 1000-word resource management database for continuous speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[19]  Daniel J. Tollin,et al.  The Precedence Effect in Sound Localization , 2015, Journal of the Association for Research in Otolaryngology.