An Uncertainty Propagation Approach to Robust ASR Using the ETSI Advanced Front-End

In this paper, we show how uncertainty propagation, combined with observation uncertainty techniques, can be applied to a realistic implementation of robust distributed speech recognition (DSR) to improve recognition robustness furthermore, with little increase in computational complexity. Uncertainty propagation, or error propagation, techniques employ a probabilistic description of speech to reflect the information lost during speech enhancement or source separation in the time or frequency domain. This uncertain description is then propagated through the feature extraction process to the domain of features used in speech recognition. In this domain, the statistical information can be combined with the statistical parameters of the recognition model by employing observation uncertainty techniques. We show that the combination of a piecewise uncertainty propagation scheme with front-end uncertainty decoding or modified imputation improves the baseline of the advanced front-end (AFE), the state of the art algorithm of the European Telecommunications Standards Institute (ETSI), on the AURORA5 database. We compare this method with other observation uncertainty techniques and show how the use of uncertainty propagation reduces the word error rates without the need for any kind of adaptation to noise using stereo data or iterative parameter estimation.

[1]  Li Deng,et al.  Uncertainty decoding with SPLICE for noise robust speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Reinhold Häb-Umbach,et al.  Improved source modeling and predictive classification for channel robust speech recognition , 2006, INTERSPEECH.

[3]  Anshu Agarwal,et al.  TWO-STAGE MEL-WARPED WIENER FILTER FOR ROBUST SPEECH RECOGNITION , 1999 .

[4]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[5]  Néstor Becerra Yoma,et al.  Speaker verification in noise using a stochastic version of the weighted Viterbi algorithm , 2002, IEEE Trans. Speech Audio Process..

[6]  Hugo Van hamme,et al.  Model-based feature enhancement with uncertainty decoding for noise robust ASR , 2006, Speech Commun..

[7]  R. McAulay,et al.  Speech enhancement using a soft-decision noise suppression filter , 1980 .

[8]  Denis Jouvet,et al.  Evaluation of a noise-robust DSR front-end on Aurora databases , 2002, INTERSPEECH.

[9]  Reinhold Orglmeister,et al.  Propagation of Statistical Information Through Non‐Linear Feature Extractions for Robust Speech Recognition , 2007 .

[10]  Yan Ming Cheng,et al.  SNR-dependent waveform processing for improving the robustness of ASR front-end , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[11]  Javier Ramírez,et al.  Including uncertainty of speech observations in robust speech recognition , 2004, INTERSPEECH.

[12]  Li Deng,et al.  Exploiting variances in robust feature extraction based on a parametric model of speech distortion , 2002, INTERSPEECH.

[13]  Ramón Fernández Astudillo,et al.  Accounting for the uncertainty of speech estimates in the complex domain for minimum mean square error speech enhancement , 2009, INTERSPEECH.

[14]  F. Ren,et al.  Blind equalization via minimization of VQ distortion for ETSI standard DSR front-end , 2003, International Conference on Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003.

[15]  R. Orglmeister,et al.  Separation and robust recognition of noisy, convolutive speech mixtures using time-frequency masking and missing data techniques , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[16]  Mark J. F. Gales,et al.  Model-based techniques for noise robust speech recognition , 1995 .

[17]  Ramón Fernández Astudillo,et al.  Uncertainty Propagation for Speech Recognition using RASTA Features in Highly Nonstationary Noisy Environments , 2011 .

[18]  Dorothea Kolossa,et al.  Independent component analysis for environmentally robust speech recognition , 2008 .

[19]  Hugo Van hamme,et al.  Application of Minimum Statistics and Minima Controlled Recursive Averaging Methods to Estimate a Cepstral Noise Model for Robust ASR , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[20]  H. Sawada,et al.  Recognition of Convolutive Speech Mixtures by Missing Feature Techniques for ICA , 2006, 2006 Fortieth Asilomar Conference on Signals, Systems and Computers.

[21]  Phil D. Green,et al.  Robust automatic speech recognition with missing and unreliable acoustic data , 2001, Speech Commun..

[22]  Joseph Lipka,et al.  A Table of Integrals , 2010 .

[23]  Yariv Ephraim,et al.  Recent Advancements in Speech Enhancement , 2004 .

[24]  Mervyn A. Jack,et al.  Improving performance of spectral subtraction in speech recognition using a model for additive noise , 1998, IEEE Trans. Speech Audio Process..

[25]  Mark J. F. Gales,et al.  Issues with uncertainty decoding for noise robust automatic speech recognition , 2008, Speech Commun..