Distant-talking speech recognition using multi-channel LMS and multiple-step linear prediction

Previously, dereverberation methods based on generalized spectral subtraction (GSS) using multi-channel least mean squares (MCLMS) and multiple-step linear prediction (MSLP) have been proposed. Both methods have in common to estimate the late reverberation characteristics blindly, to suppress the late reverberation by spectral subtraction. Speech recognition performances of both methods are changing according to length of late reverberation to be estimated. In this paper, we investigated effect of estimated length of late reverberation on distant-talking speech recognition. Moreover, we proposed method to combine MCLMS and MSLP. As a result, MCLMS-based dereverberation method is effective to reduce in the long reverberation with approximately 200 ms and MSLP dereverberation is effective for the short reverberation with approximately 100 ms. The proposed method of “MSLP+MCLMS” (that is, MCLMS is applied after MSLP) outperformed than all other dereverberation methods.

[1]  Jacob Benesty,et al.  Optimal step size of the adaptive multichannel LMS algorithm for blind SIMO identification , 2005, IEEE Signal Processing Letters.

[2]  Satoshi Nakamura,et al.  Acoustical Sound Database in Real Environments for Sound Scene Understanding and Hands-Free Speech Recognition , 2000, LREC.

[3]  Longbiao Wang,et al.  Joint sparse representation based cepstral-domain dereverberation for distant-talking speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Hong-Seok Kim,et al.  Performance of an HMM speech recognizer using a real-time tracking microphone array as input , 1999, IEEE Trans. Speech Audio Process..

[5]  DeLiang Wang,et al.  A two-stage algorithm for one-microphone reverberant speech enhancement , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Kiyohiro Shikano,et al.  Julius - an open source real-time large vocabulary recognition engine , 2001, INTERSPEECH.

[7]  Longbiao Wang,et al.  Robust Distant Speech Recognition by Combining Position-Dependent CMN with Conventional CMN , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[8]  Longbiao Wang,et al.  Robust distant speaker recognition based on position-dependent CMN by combining speaker-specific GMM with speaker-adapted HMM , 2007, Speech Commun..

[9]  Emanuel A. P. Habets,et al.  Multi-channel speech dereverberation based on a statistical model of late reverberation , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[10]  Longbiao Wang,et al.  Speech Recognition by Denoising and Dereverberation Based on Spectral Subtraction in a Real Noisy Reverberant Environment , 2012, INTERSPEECH.

[11]  Longbiao Wang,et al.  Distant-Talking Speech Recognition Based on Spectral Subtraction by Multi-Channel LMS Algorithm , 2011, IEICE Trans. Inf. Syst..

[12]  Longbiao Wang,et al.  Robust Distant Speech Recognition by Combining Multiple Microphone-Array Processing with Position-Dependent CMN , 2006, EURASIP J. Adv. Signal Process..

[13]  S. Furui,et al.  Cepstral analysis technique for automatic speaker verification , 1981 .

[14]  Longbiao Wang,et al.  Blind dereverberation based on CMN and spectral subtraction by multi-channel LMS algorithm , 2008, INTERSPEECH.

[15]  Tomohiro Nakatani,et al.  Making Machines Understand Us in Reverberant Rooms: Robustness Against Reverberation for Automatic Speech Recognition , 2012, IEEE Signal Process. Mag..

[16]  Longbiao Wang,et al.  Hands-free speaker identification based on spectral subtraction using a multi-channel least mean square approach , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  Marco Matassoni,et al.  An auditory based modulation spectral feature for reverberant speech recognition , 2010, INTERSPEECH.

[18]  Longbiao Wang,et al.  Dereverberation and denoising based on generalized spectral subtraction by multi-channel LMS algorithm using a small-scale microphone array , 2012, EURASIP Journal on Advances in Signal Processing.

[19]  Seiichi Nakagawa,et al.  Single-channel dereverberation by feature mapping using cascade neural networks for robust distant speaker identification and speech recognition , 2014, EURASIP J. Audio Speech Music. Process..

[20]  Roland Maas,et al.  Reverberation Model-Based Decoding in the Logmelspec Domain for Robust Distant-Talking Speech Recognition , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  John H. L. Hansen,et al.  Hilbert envelope based features for robust speaker identification under reverberant mismatched conditions , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).