Large-Margin Discriminative Training of Hidden Markov Models for Speech Recognition

Discriminative training has been a leading factor for improving automatic speech recognition (ASR) performance over the last decade. The traditional discriminative training, however, has been aimed to minimize empirical error rates on training sets, which may not be well generalized to test sets. Many attempts have been made recently to incorporate the principle of large margin (PLM) into the training of hidden Markov models (HMMs) in ASR to improve the generalization abilities. Significant error rate reduction on the test sets has been observed on both small vocabulary and large vocabulary continuous ASR tasks using large-margin discriminative training (LMDT) techniques. In this paper, we introduce the PLM, define the concept of margin in the HMMs, and survey a number of popular LMDT algorithms proposed and developed recently. Specifically, we review and compare the large-margin minimum classification error (LM-MCE) estimation, soft-margin estimation (SME), large margin estimation (LME), large relative margin estimation (LRME), and large margin training (LMT) with a focus on the insights, the training criteria, the optimization techniques used, and the strengths and weaknesses of these different approaches. We suggest future research directions in our conclusion of this paper.

[1]  Jinyu Li,et al.  Soft margin estimation of hidden Markov model parameters , 2006, INTERSPEECH.

[2]  Lawrence K. Saul,et al.  Large margin training of acoustic models for speech recognition , 2007 .

[3]  Hui Jiang,et al.  A constrained joint optimization method for large margin HMM estimation , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[4]  Daniel Povey,et al.  Large scale discriminative training of hidden Markov models for speech recognition , 2002, Comput. Speech Lang..

[5]  Hui Jiang,et al.  Recent Improvement on Maximum Relative Margin Estimation of HMMS for Speech Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[6]  Jonathan Le Roux,et al.  Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Lawrence K. Saul,et al.  Large Margin Gaussian Mixture Modeling for Phonetic Classification and Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[8]  Hui Jiang,et al.  Discriminative training of CDHMMs for maximum relative separation margin , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[9]  Hui Jiang,et al.  Large margin HMMs for speech recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[10]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[11]  Qi Tian,et al.  Musical genre classification using support vector machines , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[12]  Hui Jiang,et al.  Incorporating Training Errors for Large Margin HMMS Under Semi-Definite Programming Framework , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[13]  Steve J. Young,et al.  MMI training for continuous phoneme recognition on the TIMIT database , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Biing-Hwang Juang,et al.  Minimum classification error rate methods for speech recognition , 1997, IEEE Trans. Speech Audio Process..

[15]  Dong Yu,et al.  Large-Margin Minimum Classification Error Training for Large-Scale Speech Recognition Tasks , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[16]  Jinyu Li,et al.  Approximate Test Risk Minimization Through Soft Margin Estimation , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[17]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.