Comparison of Large Margin Training to Other Discriminative Methods for Phonetic Recognition by Hidden Markov Models

In this paper we compare three frameworks for discriminative training of continuous-density hidden Markov models (CD-HMMs). Specifically, we compare two popular frameworks, based on conditional maximum likelihood (CML) and minimum classification error (MCE), to a new framework based on margin maximization. Unlike CML and MCE, our formulation of large margin training explicitly penalizes incorrect decodings by an amount proportional to the number of mislabeled hidden states. It also leads to a convex optimization over the parameter space of CD-HMMs, thus avoiding the problem of spurious local minima. We used discriminatively trained CD-HMMs from all three frameworks to build phonetic recognizers on the TIMIT speech corpus. The different recognizers employed exactly the same acoustic front end and hidden state space, thus enabling us to isolate the effect of different cost functions, parameterizations, and numerical optimizations. Experimentally, we find that our framework for large margin training yields significantly lower error rates than both CML and MCE training.

[1]  Jonathan Le Roux,et al.  Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Biing-Hwang Juang,et al.  Discriminative learning for minimum error classification [pattern recognition] , 1992, IEEE Trans. Signal Process..

[3]  A. Nadas,et al.  A decision theorectic formulation of a training problem in speech recognition and a comparison of training by unconditional versus conditional maximum likelihood , 1983 .

[4]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[5]  Lawrence K. Saul,et al.  Large Margin Hidden Markov Models for Automatic Speech Recognition , 2006, NIPS.

[6]  Hui Jiang,et al.  Large margin HMMs for speech recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[7]  Lawrence K. Saul,et al.  Large Margin Gaussian Mixture Modeling for Phonetic Classification and Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[8]  James R. Glass,et al.  Heterogeneous acoustic measurements for phonetic classification 1 , 1997, EUROSPEECH.

[9]  S. Katagiri,et al.  Discriminative Learning for Minimum Error Classification , 2009 .

[10]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[11]  Lalit R. Bahl,et al.  Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Steve J. Young,et al.  MMI training for continuous phoneme recognition on the TIMIT database , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Daniel Povey,et al.  Large scale discriminative training of hidden Markov models for speech recognition , 2002, Comput. Speech Lang..

[14]  Steve J. Young,et al.  MMIE training of large vocabulary recognition systems , 1997, Speech Communication.

[15]  James R. Glass,et al.  HETEROGENEOUS ACOUSTIC MEASUREMENTS FOR PHONETIC CLASSIFICATION , 1997 .