Large margin HMMs for speech recognition

Motivated by large margin classifiers in machine learning, we propose a novel method to estimate a continuous density hidden Markov model (CDHMM) in speech recognition according to the principle of maximizing the minimum multi-class separation margin. The approach is named large margin HMM. First, we show that this type of large margin HMM estimation problem can be formulated as a standard constrained minimax optimization problem. Second, we propose an iterative localized optimization approach to perform the minimax optimization for one model at a time to guarantee that the optimal value of the objective function always exists in the course of model parameter optimization. Then, we show that during each step the optimization can be solved by the GPD (generalized probabilistic descent) algorithm if we approximate the objective function by a differentiable function, such as summation of exponential functions. The large margin HMM-based classifiers are evaluated in a speaker-independent E-set speech recognition task using the OGI ISOLET database. Experimental results show that the large margin HMMs can achieve significant word error rate (WER) reduction over conventional HMM training methods, such as maximum likelihood estimation (MLE) and minimum classification error (MCE) training.

[1]  Biing-Hwang Juang,et al.  Maximum likelihood estimation for multivariate mixture observations of markov chains , 1986, IEEE Trans. Inf. Theory.

[2]  Renato De Mori,et al.  High performance connected digit recognition using maximum mutual information estimation , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[3]  Renato De Mori,et al.  High-performance connected digit recognition using maximum mutual information estimation , 1994, IEEE Trans. Speech Audio Process..

[4]  Biing-Hwang Juang,et al.  Minimum classification error rate methods for speech recognition , 1997, IEEE Trans. Speech Audio Process..

[5]  Biing-Hwang Juang,et al.  Pattern recognition using a family of design algorithms based upon the generalized probabilistic descent method , 1998, Proc. IEEE.

[6]  Daniel Povey,et al.  Large scale discriminative training of hidden Markov models for speech recognition , 2002, Comput. Speech Lang..

[7]  Chin-Hui Lee,et al.  A dynamic in-search discriminative training approach for large vocabulary speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Thomas Hofmann,et al.  Hidden Markov Support Vector Machines , 2003, ICML.

[9]  Thomas Hofmann,et al.  Large margin methods for label sequence learning , 2003, INTERSPEECH.

[10]  Hui Jiang,et al.  Discriminative training of CDHMMs for maximum relative separation margin , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..