Clustering-Based Construction of Hidden Markov Models for Generative Kernels

Generative kernels represent theoretically grounded tools able to increase the capabilities of generative classification through a discriminative setting. Fisher Kernel is the first and mostly-used representative, which lies on a widely investigated mathematical background. The manufacture of a generative kernel flows down through a two-step serial pipeline. In the first, "generative" step, a generative model is trained, considering one model for class or a whole model for all the data; then, features or scores are extracted, which encode the contribution of each data point in the generative process. In the second, "discriminative" part, the scores are evaluated by a discriminative machine via a kernel, exploiting the data separability. In this paper we contribute to the first aspect, proposing a novel way to fit the class-data with the generative models, in specific, focusing on Hidden Markov Models (HMM). The idea is to perform model clustering on the unlabeled data in order to discover at best the structure of the entire sample set. Then, the label information is retrieved and generative scores are computed. Experimental, comparative test provides a preliminary idea on the goodness of the novel approach, pushing forward for further developments.

[1]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[2]  David G. Stork,et al.  Pattern Classification , 1973 .

[3]  Biing-Hwang Juang,et al.  HMM clustering for connected word recognition , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[4]  Manuele Bicego,et al.  Investigating hidden Markov models' capabilities in 2D shape classification , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Claus Bahlmann,et al.  Measuring HMM similarity with the Bayes probability of error and its application to online handwriting recognition , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[6]  Michael I. Jordan,et al.  Factorial Hidden Markov Models , 1995, Machine Learning.

[7]  Manuele Bicego,et al.  A Hidden Markov Model-Based Approach to Sequential Data Clustering , 2002, SSPR/SPR.

[8]  Padhraic Smyth,et al.  Clustering Sequences with Hidden Markov Models , 1996, NIPS.

[9]  Martin Layton,et al.  Augmented statistical models: exploiting generative models in discriminative classifiers , 2005 .

[10]  Padhraic Smyth,et al.  A General Probabilistic Framework for Clustering Individuals , 2000, KDD 2000.

[11]  Mark J. F. Gales,et al.  Speech Recognition using SVMs , 2001, NIPS.

[12]  Nuno Vasconcelos,et al.  A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications , 2003, NIPS.

[13]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[14]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[15]  Gautam Biswas,et al.  Clustering sequence data using hidden Markov model representation , 1999, Defense, Security, and Sensing.

[16]  Tony Jebara,et al.  Probability Product Kernels , 2004, J. Mach. Learn. Res..

[17]  Edwin R. Hancock,et al.  Structural, Syntactic, and Statistical Pattern Recognition, Joint IAPR International Workshop, SSPR&SPR 2010, Cesme, Izmir, Turkey, August 18-20, 2010. Proceedings , 2010, SSPR/SPR.

[18]  Kai-Fu Lee,et al.  Context-independent phonetic hidden Markov models for speaker-independent continuous speech recognition , 1990 .

[19]  Hong Man,et al.  Face recognition based on multi-class mapping of Fisher scores , 2005, Pattern Recognit..

[20]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[21]  Robert P. W. Duin,et al.  Group-Induced Vector Spaces , 2007, MCS.

[22]  Alex Acero,et al.  Hidden conditional random fields for phone classification , 2005, INTERSPEECH.

[23]  Zdravko Kacic,et al.  A novel loss function for the overall risk criterion based discriminative training of HMM models , 2000, INTERSPEECH.

[24]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[25]  Mário A. T. Figueiredo,et al.  Similarity-Based Clustering of Sequences Using Hidden Markov Models , 2003, MLDM.

[26]  Lalit R. Bahl,et al.  Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[27]  Cen Li,et al.  Applying the Hidden Markov Model Methodology for Unsupervised Learning of Temporal Data , 2002 .

[28]  Fatos T. Yarman-Vural,et al.  A shape descriptor based on circular hidden Markov model , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[29]  Gautam Biswas,et al.  A Bayesian Approach to Temporal Data Clustering using Hidden Markov Models , 2000, ICML.

[30]  Cen Li,et al.  A bayesian approach to temporal data clustering using the hidden markov model methodology , 2000 .

[31]  Kay-Fu Lee,et al.  Context-dependent phonetic hidden Markov models for speaker-independent continuous speech recognition , 1990, IEEE Trans. Acoust. Speech Signal Process..

[32]  N. D. Smith,et al.  Using Augmented Statistical Models and Score Spaces for Classification , 2003 .

[33]  Pat Langley,et al.  Editorial: On Machine Learning , 1986, Machine Learning.

[34]  James T. Kwok,et al.  Rival penalized competitive learning for model-based sequence clustering , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[35]  Yoram Singer,et al.  The Hierarchical Hidden Markov Model: Analysis and Applications , 1998, Machine Learning.

[36]  Alex Pentland,et al.  Coupled hidden Markov models for complex action recognition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[37]  Shai Fine,et al.  A hybrid GMM/SVM approach to speaker identification , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[38]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[39]  Manuele Bicego,et al.  2D Shape Classification Using Multifractional Brownian Motion , 2008, SSPR/SPR.

[40]  Gautam Biswas,et al.  Matryoshka: A HMM based temporal data clustering methodology for modeling system dynamics , 2002, Intell. Data Anal..

[41]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[42]  Robert P. W. Duin,et al.  Component-based discriminative classification for hidden Markov models , 2009, Pattern Recognit..

[43]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[44]  Kiyoshi Asai,et al.  Marginalized kernels for biological sequences , 2002, ISMB.

[45]  Yang He,et al.  2-D Shape Classification Using Hidden Markov Model , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[46]  Francisco Casacuberta,et al.  Cyclic Sequence Alignments: Approximate Versus Optimal Techniques , 2002, Int. J. Pattern Recognit. Artif. Intell..

[47]  Horst Bunke,et al.  Edit distance-based kernel functions for structural pattern classification , 2006, Pattern Recognit..

[48]  Padhraic Smyth,et al.  A general probabilistic framework for clustering individuals and objects , 2000, KDD '00.

[49]  Mário A. T. Figueiredo,et al.  Similarity-based classification of sequences using hidden Markov models , 2004, Pattern Recognit..

[50]  Tetsuo Kosaka,et al.  Speaker-independent phone modeling based on speaker-dependent HMMs' composition and clustering , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[51]  Alessandro Perina,et al.  A New Generative Feature Set Based on Entropy Distance for Discriminative Classification , 2009, ICIAP.