Discriminative adaptive training with VTS and JUD

Adaptive training is a powerful approach for building speech recognition systems on non-homogeneous training data. Recently approaches based on predictive model-based compensation schemes, such as Joint Uncertainty Decoding (JUD) and Vector Taylor Series (VTS), have been proposed. This paper reviews these model-based compensation schemes and relates them to factor-analysis style systems. Forms of Maximum Likelihood (ML) adaptive training with these approaches are described, based on both second-order optimisation schemes and Expectation Maximisation (EM). However, discriminative training is used in many state-of-the-art speech recognition. Hence, this paper proposes discriminative adaptive training with predictive model-compensation approaches for noise robust speech recognition. This training approach is applied to both JUD and VTS compensation with minimum phone error training. A large scale multi-environment training configuration is used and the systems evaluated on a range of in-car collected data tasks.

[1]  Michael Picheny,et al.  Robust speech recognition in noise --- performance of the IBM continuous speech recogniser on the ARPA noise spoke task , 1995 .

[2]  Philip C. Woodland,et al.  Discriminative adaptive training using the MPE criterion , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[3]  John W. McDonough,et al.  On maximum mutual information speaker-adapted training , 2008, Comput. Speech Lang..

[4]  Mark J. F. Gales,et al.  Joint uncertainty decoding for noise robust speech recognition , 2005, INTERSPEECH.

[5]  Li Deng,et al.  HMM adaptation using vector taylor series for noisy speech recognition , 2000, INTERSPEECH.

[6]  Mark J. F. Gales,et al.  Adaptive training with noisy constrained maximum likelihood linear regression for noise robust speech recognition , 2009, INTERSPEECH.

[7]  Yifan Gong,et al.  High-performance hmm adaptation with joint compensation of additive and convolutive distortions via Vector Taylor Series , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[8]  Hank Liao,et al.  Uncertainty decoding for noise robust automatic speech recognition , 2004 .

[9]  Yu Hu,et al.  Irrelevant variability normalization based HMM training using VTS approximation of an explicit model of environmental distortions , 2007, INTERSPEECH.

[10]  Bhuvana Ramabhadran,et al.  Factor analysis invariant to linear transformations of data , 1998, ICSLP.

[11]  Alex Acero,et al.  Noise adaptive training using a vector taylor series approach for noise robust automatic speech recognition , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  Mark J. F. Gales,et al.  Adaptive Training with Joint Uncertainty Decoding for Robust Recognition of Noisy Data , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[13]  Li Deng,et al.  Large-vocabulary speech recognition under adverse acoustic environments , 2000, INTERSPEECH.

[14]  Mark J. F. Gales,et al.  Incremental adaptation with VTS and joint adaptively trained systems , 2009, INTERSPEECH.

[15]  Jonathan Le Roux,et al.  Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Lawrence K. Saul,et al.  Maximum likelihood and minimum classification error factor analysis for automatic speech recognition , 2000, IEEE Trans. Speech Audio Process..

[17]  Daniel Povey,et al.  Improved discriminative training techniques for large vocabulary continuous speech recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[18]  Richard M. Schwartz,et al.  A compact model for speaker-adaptive training , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[19]  Dorothy T. Thayer,et al.  EM algorithms for ML factor analysis , 1982 .

[20]  Ho-Young Jung,et al.  Discriminative noise adaptive training approach for an environment migration , 2007, INTERSPEECH.