Nonlinear Transformations of Marginalisation Mappings for Kernels on Hidden Markov Models

Many problems in machine learning involve variable-size structured data, such as sets, sequences, trees, and graphs. Generative (i.e. model based) kernels are well suited for handling structured data since they are able to capture their underlying structure by allowing the inclusion of prior information via specification of the source models. In this paper we focus on marginalisation kernels for variable length sequences generated by hidden Markov models. In particular, we propose a new class of generative embeddings, obtained through a nonlinear transformation of the original marginalisation mappings. This allows to embed the input data into a new feature space where a better separation can be achieved and leads to a new kernel defined as the inner product in the transformed feature space. Different nonlinear transformations are proposed and two different ways of applying these transformations to the original mappings are considered. The main contribution of this paper is the proof that the proposed nonlinear transformations increase the margin of the optimal hyper plane of an SVM classifier thus enhancing the classification performance. The proposed mappings are tested on two different sequence classification problems with really satisfying results that outperform state of the art methods.

[1]  Hong Man,et al.  Face recognition based on multi-class mapping of Fisher scores , 2005, Pattern Recognit..

[2]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[3]  Nello Cristianini,et al.  Margin Distribution and Soft Margin , 2000 .

[4]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[5]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[6]  Manuele Bicego,et al.  Non-linear generative embeddings for kernels on latent variable models , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[7]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[8]  Tony Jebara,et al.  Probability Product Kernels , 2004, J. Mach. Learn. Res..

[9]  Nello Cristianini,et al.  Dynamically Adapting Kernels in Support Vector Machines , 1998, NIPS.

[10]  Gabriela Andreu,et al.  Selecting the toroidal self-organizing feature maps (TSOFM) best organized to object recognition , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[11]  Robert P. W. Duin,et al.  Component-based discriminative classification for hidden Markov models , 2009, Pattern Recognit..

[12]  Bernhard Schölkopf,et al.  Dynamic Alignment Kernels , 2000 .

[13]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[14]  Mohammed Waleed Kadous,et al.  Learning Comprehensible Descriptions of Multivariate Time Series , 1999, ICML.

[15]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[16]  C. Watkins Dynamic Alignment Kernels , 1999 .

[17]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[18]  Kiyoshi Asai,et al.  Marginalized kernels for biological sequences , 2002, ISMB.

[19]  Francisco Casacuberta,et al.  Cyclic Sequence Alignments: Approximate Versus Optimal Techniques , 2002, Int. J. Pattern Recognit. Artif. Intell..

[20]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.