Transforming features to compensate speech recogniser models for noise

To make speech recognisers robust to noise, either the features or the models can be compensated. Feature enhancement is often fast; model compensation is often more accurate, because it predicts the corrupted speech distribution. It is therefore able, for example, to take uncertainty about the clean speech into account. This paper re-analyses the recently-proposed predictive linear transformations for noise compensation as minimising the KL divergence between the predicted corrupted speech and the adapted models. New schemes are then introduced which apply observation-dependent transformations in the front-end to adapt the back-end distributions. One applies transforms in the exact same manner as the popular minimum mean square error (MMSE) feature enhancement scheme, and is as fast. The new method performs better on AURORA 2.

[1]  Yariv Ephraim,et al.  A minimum mean square error approach for speech enhancement , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[2]  Hank Liao,et al.  Joint uncertainty decoding for robust large vocabulary speech recognition , 2006 .

[3]  David Kryze,et al.  Vector taylor series based joint uncertainty decoding , 2006, INTERSPEECH.

[4]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[5]  Mark J. F. Gales,et al.  Extended VTS for Noise-Robust Speech Recognition , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Mark J. F. Gales,et al.  Incremental predictive and adaptive noise compensation , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[8]  Mark J. F. Gales,et al.  Joint uncertainty decoding for noise robust speech recognition , 2005, INTERSPEECH.

[9]  Li Deng,et al.  Uncertainty decoding with SPLICE for noise robust speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Mark J. F. Gales,et al.  Issues with uncertainty decoding for noise robust speech recognition , 2006, INTERSPEECH.

[11]  Hugo Van hamme,et al.  Joint removal of additive and convolutional noise with model-based feature enhancement , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Mark A. Clements,et al.  Using observation uncertainty in HMM decoding , 2002, INTERSPEECH.

[13]  Mark J. F. Gales,et al.  Predictive linear transforms for noise robust speech recognition , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[14]  Hugo Van hamme,et al.  Accounting for the uncertainty of speech estimates in the context of model-based feature enhancement , 2004, INTERSPEECH.