Large margin hidden Markov models for speech recognition

In this paper, motivated by large margin classifiers in machine learning, we propose a novel method to estimate continuous-density hidden Markov model (CDHMM) for speech recognition according to the principle of maximizing the minimum multiclass separation margin. The approach is named large margin HMM. First, we show this type of large margin HMM estimation problem can be formulated as a constrained minimax optimization problem. Second, we propose to solve this constrained minimax optimization problem by using a penalized gradient descent algorithm, where the original objective function, i.e., minimum margin, is approximated by a differentiable function and the constraints are cast as penalty terms in the objective function. The new training method is evaluated in the speaker-independent isolated E-set recognition and the TIDIGITS connected digit string recognition tasks. Experimental results clearly show that the large margin HMMs consistently outperform the conventional HMM training methods. It has been consistently observed that the large margin training method yields significant recognition error rate reduction even on top of some popular discriminative training methods

[1]  Hui Jiang,et al.  A constrained joint optimization method for large margin HMM estimation , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[2]  Mark J. F. Gales,et al.  Training LVCSR systems on thousands of hours of data , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[3]  Daniel Povey,et al.  Large scale discriminative training of hidden Markov models for speech recognition , 2002, Comput. Speech Lang..

[4]  Biing-Hwang Juang,et al.  Pattern recognition using a family of design algorithms based upon the generalized probabilistic descent method , 1998, Proc. IEEE.

[5]  Shigeru Katagiri,et al.  A derivation of minimum classification error from the theoretical classification risk using Parzen estimation , 2004, Comput. Speech Lang..

[6]  Steve Young,et al.  The HTK book , 1995 .

[7]  M. Kojima,et al.  Interior-Point Methods for Lagrangian Duals of Semidefinite Programs , 2000 .

[8]  Joseph Picone,et al.  A sparse modeling approach to speech recognition based on relevance vector machines , 2002, INTERSPEECH.

[9]  Hui Jiang,et al.  Discriminative training of CDHMMs for maximum relative separation margin , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[10]  Xiong Zhang,et al.  Solving Large-Scale Sparse Semidefinite Programs for Combinatorial Optimization , 1999, SIAM J. Optim..

[11]  Bernhard Schölkopf,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[12]  Chin-Hui Lee,et al.  A dynamic in-search data selection method with its applications to acoustic modeling and utterance verification , 2005, IEEE Transactions on Speech and Audio Processing.

[13]  Stephen P. Boyd,et al.  Linear Matrix Inequalities in Systems and Control Theory , 1994 .

[14]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[15]  Hui Jiang,et al.  Large margin HMMs for speech recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[16]  Erik McDermott,et al.  Minimum classification error training of landmark models for real-time continuous speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17]  Partha Niyogi,et al.  Distinctive feature detection using support vector machines , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[18]  Jason Weston,et al.  Support vector machines for multi-class pattern recognition , 1999, ESANN.

[19]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[20]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[21]  Joseph Picone,et al.  Applications of support vector machines to speech recognition , 2004, IEEE Transactions on Signal Processing.

[22]  Joseph Picone,et al.  Support vector machines for speech recognition , 1998, ICSLP.

[23]  Fernando Pérez-Cruz,et al.  Kernel methods and their applications to signal processing , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[24]  Pedro J. Moreno,et al.  On the use of support vector machines for phonetic classification , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[25]  Dina Katabi,et al.  Using support vector machines for spoken digit recognition , 2000, INTERSPEECH.

[26]  Pedro J. Moreno,et al.  A new SVM approach to speaker identification and verification using probabilistic distance kernels , 2003, INTERSPEECH.

[27]  Alexander J. Smola,et al.  Advances in Large Margin Classifiers , 2000 .

[28]  Shigeki Sagayama,et al.  Support vector machine with dynamic time-alignment kernel for speech recognition , 2001, INTERSPEECH.

[29]  Hui Jiang,et al.  Maximum relative margin estimation of HMMS based on N-best string models for continuous speech recognition , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[30]  John Holdsworth,et al.  A comparison of preprocessors for the cambridge recurrent error propagation network speech recognition system , 1990, ICSLP.

[31]  Thomas Hofmann,et al.  Hidden Markov Support Vector Machines , 2003, ICML.

[32]  Daniel Povey,et al.  Minimum Phone Error and I-smoothing for improved discriminative training , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[33]  H. Ney,et al.  Model-based MCE bound to the true Bayes' error , 2001, IEEE Signal Processing Letters.

[34]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[35]  Anthony Man-Cho So,et al.  Theory of semidefinite programming for Sensor Network Localization , 2005, SODA '05.

[36]  Yinyu Ye,et al.  Semidefinite programming for ad hoc wireless sensor network localization , 2004, Third International Symposium on Information Processing in Sensor Networks, 2004. IPSN 2004.

[37]  Lalit R. Bahl,et al.  Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[38]  Mark J. F. Gales,et al.  Maximum margin training of generative kernels , 2004 .

[39]  Biing-Hwang Juang,et al.  Minimum error rate training based on N-best string models , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[40]  E. Yaz Linear Matrix Inequalities In System And Control Theory , 1998, Proceedings of the IEEE.

[41]  Keikichi Hirose,et al.  Robust speech recognition based on a Bayesian prediction approach , 1999, IEEE Trans. Speech Audio Process..

[42]  David P. Williamson,et al.  Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming , 1995, JACM.

[43]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[44]  Biing-Hwang Juang,et al.  Minimum classification error rate methods for speech recognition , 1997, IEEE Trans. Speech Audio Process..

[45]  Katya Scheinberg,et al.  Interior Point Trajectories in Semidefinite Programming , 1998, SIAM J. Optim..

[46]  Mohamed Afify,et al.  Statistical performance analysis of MCE/GPD learning in Gaussian classifiers and hidden Markov models [speech recognition example] , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[47]  Louis A. Liporace,et al.  Maximum likelihood estimation for multivariate observations of Markov sources , 1982, IEEE Trans. Inf. Theory.

[48]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[49]  Gerhard Rigoll,et al.  A hybrid SVM/HMM acoustic modeling approach to automatic speech recognition , 2004, INTERSPEECH.

[50]  B.-H. Juang,et al.  Maximum-likelihood estimation for mixture multivariate stochastic observations of Markov chains , 1985, AT&T Technical Journal.

[51]  Fernando Pérez-Cruz,et al.  Multi-class support vector machines: a new approach , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[52]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[53]  Mark J. F. Gales,et al.  Speech Recognition using SVMs , 2001, NIPS.

[54]  Michael Picheny,et al.  On a model-robust training method for speech recognition , 1988, IEEE Trans. Acoust. Speech Signal Process..

[55]  Renato De Mori,et al.  High-performance connected digit recognition using maximum mutual information estimation , 1994, IEEE Trans. Speech Audio Process..

[56]  Steve Renals,et al.  Evaluation of kernel methods for speaker verification and identification , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[57]  Simon King,et al.  Framewise phone classification using support vector machines , 2002, INTERSPEECH.

[58]  William M. Campbell,et al.  Support vector machines for speaker verification and identification , 2000, Neural Networks for Signal Processing X. Proceedings of the 2000 IEEE Signal Processing Society Workshop (Cat. No.00TH8501).

[59]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[60]  R. G. Leonard,et al.  A database for speaker-independent digit recognition , 1984, ICASSP.

[61]  George Saon,et al.  Digit recognition in noisy environments via a sequential GMM/SVM system , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[62]  Steven E. Golowich,et al.  A Support Vector/Hidden Markov Model Approach to Phoneme Recognition , 1998 .

[63]  Hermann Ney,et al.  Investigations on error minimizing training criteria for discriminative training in automatic speech recognition , 2005, INTERSPEECH.

[64]  Joseph Picone,et al.  Hybrid SVM/HMM architectures for speech recognition , 2000, INTERSPEECH.

[65]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[66]  Farid Alizadeh,et al.  Interior Point Methods in Semidefinite Programming with Applications to Combinatorial Optimization , 1995, SIAM J. Optim..

[67]  Jonathan Le Roux,et al.  Optimization methods for discriminative training , 2005, INTERSPEECH.