Combining missing-data reconstruction and uncertainty decoding for robust speech recognition

This paper proposes a novel approach for noise-robust speech recognition which combines a missing-data (MD) derived spectral reconstruction technique and uncertainty decoding based on the weighted Viterbi algorithm (WVA). First, the noisy feature vectors are compensated by using a novel MD imputation technique based on the integration of truncated Gaussian pdfs. Although the proposed MD estimator has both the advantages of MD techniques and the use of cepstral features, it may still be affected by a number of uncertainty sources. In order to deal with these uncertainties, WVA-based uncertainty decoding is proposed. Our experiments on the Aurora-2 and Aurora-4 tasks show that the proposed MD estimator outperforms other MD imputation techniques. Also, we show that the combination of MD imputation with WVA provides better results than the combination with other uncertainty processing techniques such as the use of evidence pdfs for the estimated features.

[1]  Friedrich Faubel,et al.  Bounded conditional mean imputation with Gaussian mixture models: A reconstruction approach to partly occluded features , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  DeLiang Wang,et al.  Transforming Binary Uncertainties for Robust Speech Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[4]  Mark J. F. Gales,et al.  Issues with uncertainty decoding for noise robust automatic speech recognition , 2008, Speech Commun..

[5]  Ángel M. Gómez,et al.  MMSE-Based Packet Loss Concealment for CELP-Coded Speech Recognition , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Richard M. Stern,et al.  Reconstruction of missing features for robust speech recognition , 2004, Speech Commun..

[7]  Li Deng,et al.  Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion , 2005, IEEE Transactions on Speech and Audio Processing.

[8]  Phil D. Green,et al.  Robust automatic speech recognition with missing and unreliable acoustic data , 2001, Speech Commun..

[9]  Ulpu Remes,et al.  Observation uncertainty measures for sparse imputation , 2010, INTERSPEECH.

[10]  Ángel M. Gómez,et al.  Efficient MMSE Estimation and Uncertainty Processing for Multienvironment Robust Speech Recognition , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Ljubomir Josifovski,et al.  Robust Automatic Speech Recognition with Missing and Unreliable Data , 2003 .

[12]  Mervyn A. Jack,et al.  Weighted Viterbi algorithm and state duration modelling for speech recognition in noise , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).