Speech enhancement using combination of dereverberation and noise reduction for robust speech recognition

In this paper, we describe a speech enhancement approach for robust speech recognition. This approach consists of two stages to solve both current problems of speech recognition: reverberation and noise. Firstly, speech signal is dereveberated by suppression of slowly -- varying components and the falling edge of the power envelope (SSF). Then a binaural speech processing is applied to remove noise from target speech. Speech recognition results show that this combination algorithm provides a good robustness in real environments.

[1]  Guy J. Brown,et al.  Mask estimation for missing data speech recognition based on statistics of binaural interaction , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Parham Aarabi,et al.  Real-time dual-microphone speech enhancement using field programmable gate arrays , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[3]  Richard M. Stern,et al.  Signal separation for robust speech recognition based on phase difference information obtained in the frequency domain , 2009, INTERSPEECH.

[4]  Richard M. Stern,et al.  Spatial separation of speech signals using amplitude estimation based on interaural comparisons of zero-crossings , 2009, Speech Commun..

[5]  Richard M. Stern,et al.  Binaural sound source separation motivated by auditory processing , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Hirokazu Kameoka,et al.  Robust speech dereverberation based on non-negativity and sparse nature of speech spectrograms , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Hynek Hermansky,et al.  Recognition of Reverberant Speech Using Frequency Domain Linear Prediction , 2008, IEEE Signal Processing Letters.

[8]  Richard M. Stern,et al.  Nonlinear enhancement of onset for robust speech recognition , 2010, INTERSPEECH.

[9]  Guy J. Brown,et al.  A binaural processor for missing data speech recognition in the presence of noise and small-room reverberation , 2004, Speech Commun..

[10]  R. Patterson,et al.  Complex Sounds and Auditory Images , 1992 .

[11]  DeLiang Wang,et al.  Robust speech recognition by integrating speech separation and hypothesis testing , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[12]  Le Xuan Hung,et al.  Influence of F0 on Vietnamese syllable perception , 2005, INTERSPEECH.

[13]  Richard M. Stern,et al.  Gammatone sub-band magnitude-domain dereverberation for ASR , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Kazuya Takeda,et al.  A binaural speech processing method using subband-cross correlation analysis for noise robust recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15]  Parham Aarabi,et al.  Phase-based dual-microphone robust speech enhancement , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).