论文信息 - Large Margin Hidden Markov Models for Automatic Speech Recognition

Large Margin Hidden Markov Models for Automatic Speech Recognition

We study the problem of parameter estimation in continuous density hidden Markov models (CD-HMMs) for automatic speech recognition (ASR). As in support vector machines, we propose a learning algorithm based on the goal of margin maximization. Unlike earlier work on max-margin Markov networks, our approach is specifically geared to the modeling of real-valued observations (such as acoustic feature vectors) using Gaussian mixture models. Unlike previous discriminative frameworks for ASR, such as maximum mutual information and minimum classification error, our framework leads to a convex optimization, without any spurious local minima. The objective function for large margin training of CD-HMMs is defined over a parameter space of positive semidefinite matrices. Its optimization can be performed efficiently with simple gradient-based methods that scale well to large problems. We obtain competitive results for phonetic recognition on the TIMIT speech corpus.

Lawrence K. Saul | Fei Sha | L. Saul | Fei Sha

[1] A. Nadas,et al. A decision theorectic formulation of a training problem in speech recognition and a comparison of training by unconditional versus conditional maximum likelihood , 1983 .

[2] Kai-Fu Lee,et al. Speaker‐independent phoneme recognition using hidden Markov models , 1988 .

[3] Dimitri Kanevsky,et al. An inequality for rational functions with applications to some statistical estimation problems , 1991, IEEE Trans. Inf. Theory.

[4] Biing-Hwang Juang,et al. Discriminative learning for minimum error classification [pattern recognition] , 1992, IEEE Trans. Signal Process..

[5] Steve J. Young,et al. MMI training for continuous phoneme recognition on the TIMIT database , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6] Anthony J. Robinson,et al. An application of recurrent nets to phone probability estimation , 1994, IEEE Trans. Neural Networks.

[7] Dimitri P. Bertsekas,et al. Nonlinear Programming , 1997 .

[8] Harris Drucker,et al. Comparison of learning algorithms for handwritten digit recognition , 1995 .

[9] Stephen P. Boyd,et al. Semidefinite Programming , 1996, SIAM Rev..

[10] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[11] Steve Young,et al. Acoustic Modelling for Large Vocabulary Continuous Speech Recognition , 1999 .

[12] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[13] Daniel Povey,et al. Large scale discriminative training of hidden Markov models for speech recognition , 2002, Comput. Speech Lang..

[14] Ben Taskar,et al. Max-Margin Markov Networks , 2003, NIPS.

[15] Thomas Hofmann,et al. Hidden Markov Support Vector Machines , 2003, ICML.

[16] Jonathan Le Roux,et al. Optimization methods for discriminative training , 2005, INTERSPEECH.

[17] Hui Jiang,et al. Large margin HMMs for speech recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[18] Lawrence K. Saul,et al. Large Margin Gaussian Mixture Modeling for Phonetic Classification and Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[19] Lawrence K. Saul,et al. Large margin training of acoustic models for speech recognition , 2007 .

[20] Lawrence K. Saul,et al. Comparison of Large Margin Training to Other Discriminative Methods for Phonetic Recognition by Hidden Markov Models , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.