Learning state machine-based string edit kernels

During the past few years, several works have been done to derive string kernels from probability distributions. For instance, the Fisher kernel uses a generative model M (e.g. a hidden Markov model) and compares two strings according to how they are generated by M. On the other hand, the marginalized kernels allow the computation of the joint similarity between two instances by summing conditional probabilities. In this paper, we adapt this approach to edit distance-based conditional distributions and we present a way to learn a new string edit kernel. We show that the practical computation of such a kernel between two strings x and x^' built from an alphabet @S requires (i) to learn edit probabilities in the form of the parameters of a stochastic state machine and (ii) to calculate an infinite sum over @S^* by resorting to the intersection of probabilistic automata as done for rational kernels. We show on a handwritten character recognition task that our new kernel outperforms not only the state of the art string kernels and string edit kernels but also the standard edit distance used by a neighborhood-based classifier.

[1]  Mehryar Mohri,et al.  Rational Kernels: Theory and Algorithms , 2004, J. Mach. Learn. Res..

[2]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[3]  Jason Eisner,et al.  Parameter Estimation for Probabilistic Finite-State Transducers , 2002, ACL.

[4]  José Manuel Iñesta Quereda,et al.  Melody Recognition with Learned Edit Distances , 2008, SSPR/SPR.

[5]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Horst Bunke,et al.  Edit distance-based kernel functions for structural pattern classification , 2006, Pattern Recognit..

[7]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[8]  Jason Weston,et al.  Mismatch string kernels for discriminative protein classification , 2004, Bioinform..

[10]  Durbin,et al.  Biological Sequence Analysis , 1998 .

[11]  Marc Sebban,et al.  A Discriminative Model of Stochastic Edit Distance in the Form of a Conditional Transducer , 2006, ICGI.

[12]  Eleazar Eskin,et al.  The Spectrum Kernel: A String Kernel for SVM Protein Classification , 2001, Pacific Symposium on Biocomputing.

[13]  M. O. Dayhoff,et al.  22 A Model of Evolutionary Change in Proteins , 1978 .

[14]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[15]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[16]  Bernhard Schölkopf,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[17]  Kiyoshi Asai,et al.  Marginalized kernels for biological sequences , 2002, ISMB.

[18]  Marc Sebban,et al.  Learning stochastic edit distance: Application in handwritten character recognition , 2006, Pattern Recognit..

[19]  Pavel Brazdil,et al.  Proceedings of the European Conference on Machine Learning , 1993 .

[20]  Marc Sebban,et al.  Learning probabilistic models of tree edit distance , 2008, Pattern Recognit..

[21]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[22]  M. O. Dayhoff A model of evolutionary change in protein , 1978 .

[23]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[24]  Marc Sebban,et al.  Learning Metrics Between Tree Structured Data: Application to Image Recognition , 2007, ECML.

[25]  Marc Sebban,et al.  SEDiL: Software for Edit Distance Learning , 2008, ECML/PKDD.

[26]  Guillaume Bouchard,et al.  The Tradeoff Between Generative and Discriminative Classifiers , 2004 .

[27]  Hisashi Kashima,et al.  Marginalized Kernels Between Labeled Graphs , 2003, ICML.

[28]  Andrew McCallum,et al.  A Conditional Random Field for Discriminatively-trained Finite-state String Edit Distance , 2005, UAI.

[29]  François Laviolette,et al.  HIV-1 coreceptor usage prediction without multiple alignments: an application of string kernels , 2008, Retrovirology.

[30]  Raymond J. Mooney,et al.  Adaptive duplicate detection using learnable string similarity measures , 2003, KDD '03.

[31]  Tatsuya Akutsu,et al.  Protein homology detection using string alignment kernels , 2004, Bioinform..

[32]  Tatsuya Akutsu,et al.  Optimizing amino acid substitution matrices with a local alignment kernel , 2006, BMC Bioinformatics.

[33]  Peter N. Yianilos,et al.  Learning String-Edit Distance , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  David Haussler,et al.  Convolution kernels on discrete structures , 1999 .

[35]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .