A generative-discriminative hybrid for sequential data classification [image classification example]

Classification of sequential data using discriminative models such as support vector machines is very hard due to the variable length of this type of data. On the other hand, generative models such as HMMs have become the standard tool for representing sequential data due to their efficiency. This paper proposes a general generative-discriminative framework that uses HMMs to map the variable length sequential data into a fixed size P-dimensional vector (likelihood score) that can be easily classified using any discriminative model. The preliminary experiments of the framework on the MNIST database for handwritten digits have achieved a better recognition rate of 98.02% than that of standard HMMs (94.19%).

[1]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[2]  Robert Sabourin,et al.  An HMM-Based Approach for Off-Line Unconstrained Handwritten Word Modeling and Recognition , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Chin-Hui Lee,et al.  MAP Estimation of Continuous Density HMM : Theory and Applications , 1992, HLT.

[4]  Trevor J. Hastie,et al.  Discriminative vs Informative Learning , 1997, KDD.

[5]  Guillaume Bouchard,et al.  The Tradeoff Between Generative and Discriminative Classifiers , 2004 .

[6]  Thorsten Brants Estimating Markov model structures , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[7]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[8]  Alain Biem,et al.  A model selection criterion for classification: application to HMM topology optimization , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[9]  Samy Bengio,et al.  Hybrid generative-discriminative models for speech and speaker recognition , 2002 .

[10]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[11]  Andreas Stolcke,et al.  Hidden Markov Model} Induction by Bayesian Model Merging , 1992, NIPS.

[12]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[13]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[14]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[15]  Lawrence K. Saul,et al.  Maximum likelihood and minimum classification error factor analysis for automatic speech recognition , 2000, IEEE Trans. Speech Audio Process..

[16]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[17]  Lalit R. Bahl,et al.  Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.