Direct Error Rate Minimization of Hidden Markov Models

We explore discriminative training of HMM parameters that directly minimizes the expected error rate. In discriminative training one is interested in training a system to minimize a desired error function, like word error rate, phone error rate, or frame error rate. We review a recent method (McAllester, Hazan and Keshet, 2010), which introduces an analytic expression for the gradient of the expected error-rate. The analytic expression leads to a perceptron-like update rule, which is adapted here for training of HMMs in an online fashion. While the proposed method can work with any type of the error function used in speech recognition, we evaluated it on phoneme recognition of TIMIT, when the desired error function used for training was frame error rate. Except for the case of GMM with a single mixture per state, the proposed update rule provides lower error rates, both in terms of frame error rate and phone error rate, than other approaches, including MCE and large margin.

[1]  Lawrence K. Saul,et al.  Comparison of Large Margin Training to Other Discriminative Methods for Phonetic Recognition by Hidden Markov Models , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[2]  Samy Bengio,et al.  Automatic Speech and Speaker Recognition: Large Margin and Kernel Methods , 2009 .

[3]  Lawrence K. Saul,et al.  A fast online algorithm for large margin training of continuous density hidden Markov models , 2009, INTERSPEECH.

[4]  Lawrence K. Saul,et al.  Matrix updates for perceptron training of continuous density hidden Markov models , 2009, ICML '09.

[5]  Shinji Watanabe,et al.  Discriminative training based on an integrated view of MPE and MMI in margin and error space , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Tamir Hazan,et al.  PAC-Bayesian approach for minimization of phoneme error rate , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[8]  S. Katagiri,et al.  Discriminative Learning for Minimum Error Classification , 2009 .

[9]  Tamir Hazan,et al.  Direct Loss Minimization for Structured Prediction , 2010, NIPS.

[10]  Lalit R. Bahl,et al.  Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Lawrence K. Saul,et al.  Online Learning and Acoustic Feature Adaptation in Large-Margin Hidden Markov Models , 2010, IEEE Journal of Selected Topics in Signal Processing.

[12]  Biing-Hwang Juang,et al.  Discriminative learning for minimum error classification [pattern recognition] , 1992, IEEE Trans. Signal Process..

[13]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[14]  Jonathan Le Roux,et al.  Optimization methods for discriminative training , 2005, INTERSPEECH.

[15]  Fei Sha,et al.  Large Margin Training of Continuous Density Hidden Markov Models , 2009 .

[16]  Claudio Gentile,et al.  On the generalization ability of on-line learning algorithms , 2001, IEEE Transactions on Information Theory.

[17]  Daniel Povey,et al.  Minimum Phone Error and I-smoothing for improved discriminative training , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[18]  Jonathan Le Roux,et al.  Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error , 2007, IEEE Transactions on Audio, Speech, and Language Processing.