Binaural speech enhancement system combining dereverberation and spatial masking-based noise removal for robust speech recognition

In this paper, we present a binaural speech enhancement system as a preprocessing step for the robust speech recognition. This system employs an existing dereverberation technique followed by a spatial masking-based noise removal algorithm where only signals coming from the desired directions are retained by using a threshold angle. While state-of-the art approaches fix the threshold angle heuristically over all time frames, we propose to consider an adaptive computation where this threshold angle is first learned in several noise-only frames and then updated frame by frame. Speech recognition results in real environment show the effectiveness of the proposed speech enhancement approach.

[1]  Richard M. Stern,et al.  Nonlinear enhancement of onset for robust speech recognition , 2010, INTERSPEECH.

[2]  Te-Won Lee,et al.  Blind Speech Separation , 2007, Blind Speech Separation.

[3]  Kazuya Takeda,et al.  A binaural speech processing method using subband-cross correlation analysis for noise robust recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Dang Khoa Nguyen,et al.  Speech enhancement using combination of dereverberation and noise reduction for robust speech recognition , 2011, SoICT '11.

[5]  Tristan Kleinschmidt,et al.  Robust speech recognition using speech enhancement , 2010 .

[6]  Richard M. Stern,et al.  Binaural sound source separation motivated by auditory processing , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Richard M. Stern,et al.  Spatial separation of speech signals using amplitude estimation based on interaural comparisons of zero-crossings , 2009, Speech Commun..

[8]  Rémi Gribonval,et al.  Under-Determined Reverberant Audio Source Separation Using a Full-Rank Spatial Covariance Model , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  R.M. Stern,et al.  Missing-feature approaches in speech recognition , 2005, IEEE Signal Processing Magazine.

[10]  Philipos C. Loizou,et al.  Improving Speech Intelligibility in Noise Using Environment-Optimized Algorithms , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Phil D. Green,et al.  Robust automatic speech recognition with missing and unreliable acoustic data , 2001, Speech Commun..

[12]  DeLiang Wang,et al.  Robust speech recognition by integrating speech separation and hypothesis testing , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..