Forward Decoding Kernel Machines: A Hybrid HMM/SVM Approach to Sequence Recognition

Forward Decoding Kernel Machines (FDKM) combine largemargin classifiers with Hidden Markov Models (HMM) for Maximum a Posteriori (MAP) adaptive sequence estimation. State transitions in the sequence are conditioned on observed data using a kernel-based probability model, and forward decoding of the state transition probabilities with the sum-product algorithm directly produces the MAP sequence. The parameters in the probabilistic model are trained using a recursive scheme that maximizes a lower bound on the regularized cross-entropy. The recursion performs an expectation step on the outgoing state of the transition probability model, using the posterior probabilities produced by the previous maximization step. Similar to Expectation-Maximization (EM), the FDKM recursion deals effectively with noisy and partially labeled data.We also introduce a multi-class support vector machine for sparse conditional probability regression, GiniSVM based on a quadratic formulation of entropy. Experiments with benchmark classification data show that GiniSVM generalizes better than other multi-class SVM techniques. In conjunction with FDKM, GiniSVM produces a sparse kernel expansion of state transition probabilities, with drastically fewer non-zero coefficients than kernel logistic regression. Preliminary evaluation of FDKM with GiniSVM on a subset of the TIMIT speech database reveals significant improvements in phoneme recognition accuracy over other SVM and HMM techniques.

[1]  David Haussler,et al.  Probabilistic kernel regression models , 1999, AISTATS.

[2]  G. Wahba Support vector machines, reproducing kernel Hilbert spaces, and randomized GACV , 1999 .

[3]  John Cocke,et al.  Optimal decoding of linear codes for minimizing symbol error rate (Corresp.) , 1974, IEEE Trans. Inf. Theory.

[4]  Ji Zhu,et al.  Kernel Logistic Regression and the Import Vector Machine , 2001, NIPS.

[5]  Pedro J. Moreno,et al.  On the use of support vector machines for phonetic classification , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[6]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[8]  Gert Cauwenberghs,et al.  Hybrid support vector machine/hidden Markov model approach for continuous speech recognition , 2000, Proceedings of the 43rd IEEE Midwest Symposium on Circuits and Systems (Cat.No.CH37144).

[9]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[10]  Tomaso A. Poggio,et al.  Regularization Theory and Neural Networks Architectures , 1995, Neural Computation.

[11]  Anthony J. Robinson,et al.  An application of recurrent nets to phone probability estimation , 1994, IEEE Trans. Neural Networks.

[12]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[13]  Gert Cauwenberghs,et al.  Incremental and Decremental Support Vector Machine Learning , 2000, NIPS.

[14]  Hervé Bourlard,et al.  Connectionist Speech Recognition: A Hybrid Approach , 1993 .

[15]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[16]  P. Bartlett,et al.  Probabilities for SV Machines , 2000 .

[17]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[18]  P. Ladefoged A course in phonetics , 1975 .

[19]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[20]  Jason Weston,et al.  Multi-Class Support Vector Machines , 1998 .

[21]  Gert Cauwenberghs,et al.  Sequence estimation and channel equalization using forward decoding kernel machines , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.