Weight-Space Viterbi Decoding Based Spectral Subtraction for Reverberant Speech Recognition

A single-channel blind dereverberation algorithm is proposed in this letter for distant-talking speech recognition. The proposed method is based on spectral subtraction (SS) method, in which the spectrum of a late reverberant signal is estimated using a delayed and attenuated version of the reverberant signal. Through some assumptions, the conventional SS method regards the attenuation weight as a constant that is a function of reverberation time. However, these assumptions are not valid in real situations, and the ideal weight varies with the frame. Therefore, in the proposed method, the variable weight sequence is estimated using Viterbi decoding scheme based on the reverberation model. This weight sequence is then substituted for the fixed weight in the conventional SS method without explicitly estimating the reverberation time. The proposed method performs better than the conventional SS method in both isolated word recognition and connected digit recognition experiments in reverberant environments.

[1]  Reinhold Häb-Umbach,et al.  Model-Based Feature Enhancement for Reverberant Speech Recognition , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Jeff A. Bilmes,et al.  MVA Processing of Speech Features , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Youngjik Lee,et al.  Implementation of the POW (phonetically optimized words) algorithm for speech database , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[4]  Marc Delcroix,et al.  Dereverberation and Denoising Using Multichannel Linear Prediction , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[6]  Hans-Günter Hirsch,et al.  A new approach for the adaptation of HMMs to reverberation and background noise , 2008, Speech Commun..

[7]  Masato Miyoshi,et al.  Inverse filtering of room acoustics , 1988, IEEE Trans. Acoust. Speech Signal Process..

[8]  Tomohiro Nakatani,et al.  Suppression of Late Reverberation Effect on Speech Signal Using Long-Term Multiple-step Linear Prediction , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Roland Maas,et al.  Reverberation Model-Based Decoding in the Logmelspec Domain for Robust Distant-Talking Speech Recognition , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  J.-M. Boucher,et al.  A New Method Based on Spectral Subtraction for Speech Dereverberation , 2001 .

[11]  Van Nostrand,et al.  Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm , 1967 .

[12]  Biing-Hwang Juang,et al.  Speech Dereverberation Based on Variance-Normalized Delayed Linear Prediction , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Emanuel A. P. Habets,et al.  On the application of reverberation suppression to robust speech recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Yang Lu,et al.  A geometric approach to spectral subtraction , 2008, Speech Commun..

[15]  Peter Vary,et al.  A binaural room impulse response database for the evaluation of dereverberation algorithms , 2009, 2009 16th International Conference on Digital Signal Processing.

[16]  Dong Yu,et al.  Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Patrick A. Naylor,et al.  Noise-robust reverberation time estimation using spectral decay distributions with reduced computational cost , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[18]  Masafumi Nishimura,et al.  Acoustic Model Adaptation Using First-Order Linear Prediction for Reverberant Speech , 2006, IEICE Trans. Inf. Syst..

[19]  Shigeki Sagayama,et al.  Model Adaptation for Long Convolutional Distortion by Maximum Likelihood Based State Filtering Approach , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[20]  Emanuel A. P. Habets,et al.  Late Reverberant Spectral Variance Estimation Based on a Statistical Model , 2009, IEEE Signal Processing Letters.

[21]  R. G. Leonard,et al.  A database for speaker-independent digit recognition , 1984, ICASSP.

[22]  Tomohiro Nakatani,et al.  Harmonicity-Based Blind Dereverberation for Single-Channel Speech Signals , 2007, IEEE Transactions on Audio, Speech, and Language Processing.