论文信息 - Investigations on an EM-Style Optimization Algorithm for Discriminative Training of HMMs

Investigations on an EM-Style Optimization Algorithm for Discriminative Training of HMMs

Today's speech recognition systems are based on hidden Markov models (HMMs) with Gaussian mixture models whose parameters are estimated using a discriminative training criterion such as Maximum Mutual Information (MMI) or Minimum Phone Error (MPE). Currently, the optimization is almost always done with (empirical variants of) Extended Baum-Welch (EBW). This type of optimization requires sophisticated update schemes for the step sizes and a considerable amount of parameter tuning, and only little is known about its convergence behavior. In this paper, we derive an EM-style algorithm for discriminative training of HMMs. Like Expectation-Maximization (EM) for the generative training of HMMs, the proposed algorithm improves the training criterion on each iteration, converges to a local optimum, and is completely parameter-free. We investigate the feasibility of the proposed EM-style algorithm for discriminative training of two tasks, namely grapheme-to-phoneme conversion and spoken digit string recognition.

[1] Jonathan Le Roux,et al. Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[2] Hermann Ney,et al. Deformation Models for Image Recognition , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3] Alex Acero,et al. Hidden conditional random fields for phone classification , 2005, INTERSPEECH.

[4] Georg Heigold,et al. EM-style optimization of hidden conditional random fields for grapheme-to-phoneme conversion , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5] Hermann Ney,et al. Discriminative training with tied covariance matrices , 2004, INTERSPEECH.

[6] New York Dover,et al. ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[7] Salvatore D. Morgera,et al. An improved MMIE training algorithm for speaker-independent, small vocabulary, continuous speech recognition , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[8] Tanja Schultz,et al. Generalized Baum-Welch algorithm for discriminative training on large vocabulary continuous speech recognition system , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9] Mitchell P. Marcus,et al. Text Chunking using Transformation-Based Learning , 1995, VLC@ACL.

[10] Stefan Riezler,et al. Probabilistic Constraint Logic Programming , 1997, ArXiv.

[11] Scott Axelrod,et al. Discriminative Estimation of Subspace Constrained Gaussian Mixture Models for Speech Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[12] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.

[13] Mark Johnson,et al. Lexicalized Stochastic Modeling of Constraint-Based Grammars using Log-Linear Measures and EM Training , 2000, ACL.

[14] Georg Heigold,et al. Equivalence of Generative and Log-Linear Models , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[15] Hermann Ney,et al. A Convergence Analysis of Log-Linear Training , 2011, NIPS.

[16] Georg Heigold,et al. Margin-Based Discriminative Training for String Recognition , 2010, IEEE Journal of Selected Topics in Signal Processing.

[17] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[18] Georg Heigold,et al. An empirical study of learning rates in deep neural networks for speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19] Georg Heigold,et al. Modified MPE/MMI in a transducer-based framework , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[20] Hermann Ney,et al. Investigations on error minimizing training criteria for discriminative training in automatic speech recognition , 2005, INTERSPEECH.

[21] Daniel Povey,et al. Large scale discriminative training of hidden Markov models for speech recognition , 2002, Comput. Speech Lang..

[22] Dimitri Kanevsky,et al. An inequality for rational functions with applications to some statistical estimation problems , 1991, IEEE Trans. Inf. Theory.

[23] Georg Heigold,et al. Modified MMI/MPE: a direct evaluation of the margin in speech recognition , 2008, ICML '08.

[24] George D. Magoulas,et al. New globally convergent training scheme based on the resilient propagation algorithm , 2005, Neurocomputing.

[25] L. Armijo. Minimization of functions having Lipschitz continuous first partial derivatives. , 1966 .

[26] Martin A. Riedmiller,et al. A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[27] J. Darroch,et al. Generalized Iterative Scaling for Log-Linear Models , 1972 .

[28] Detlev Langmann,et al. A comparative study of linear feature transformation techniques for automatic speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[29] E. Ising. Beitrag zur Theorie des Ferromagnetismus , 1925 .

[30] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[31] John D. Lafferty,et al. Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[32] Georg Heigold,et al. A log-linear discriminative modeling framework for speech recognition , 2010 .

[33] Dale Schuurmans,et al. The latent maximum entropy principle , 2002, Proceedings IEEE International Symposium on Information Theory,.

[34] Alex Pentland,et al. Discriminative, generative and imitative learning , 2002 .

[35] Mohamed Afify. Extended baum-welch reestimation of Gaussian mixture models based on reverse Jensen inequality , 2005, INTERSPEECH.

[36] Ralf Schlüter,et al. Investigations on discriminative training criteria , 2000 .

[37] Brian Kingsbury,et al. Boosted MMI for model and feature-space discriminative training , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.