Weighted decomposition kernels

We introduce a family of kernels on discrete data structures within the general class of decomposition kernels. A weighted decomposition kernel (WDK) is computed by dividing objects into substructures indexed by a selector. Two substructures are then matched if their selectors satisfy an equality predicate, while the importance of the match is determined by a probability kernel on local distributions fitted on the substructures. Under reasonable assumptions, a WDK can be computed efficiently and can avoid combinatorial explosion of the feature space. We report experimental evidence that the proposed kernel is highly competitive with respect to more complex state-of-the-art methods on a set of problems in bioinformatics.

[1]  David Haussler,et al.  Convolution kernels on discrete structures , 1999 .

[2]  David Haussler,et al.  A Discriminative Framework for Detecting Remote Protein Homologies , 2000, J. Comput. Biol..

[3]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[4]  Shai Ben-David,et al.  Limitations of Learning Via Embeddings in Euclidean Half Spaces , 2003, J. Mach. Learn. Res..

[5]  Luc De Raedt,et al.  Molecular feature mining in HIV data , 2001, KDD '01.

[6]  Ashwin Srinivasan,et al.  The Predictive Toxicology Challenge 2000-2001 , 2001, Bioinform..

[7]  Zhirong Sun,et al.  Support vector machine approach for protein subcellular localization prediction , 2001, Bioinform..

[8]  Michael Collins,et al.  Convolution Kernels for Natural Language , 2001, NIPS.

[9]  Eleazar Eskin,et al.  The Spectrum Kernel: A String Kernel for SVM Protein Classification , 2001, Pacific Symposium on Biocomputing.

[10]  Bernhard Schölkopf,et al.  A Kernel Approach for Learning from Almost Orthogonal Patterns , 2002, European Conference on Principles of Data Mining and Knowledge Discovery.

[11]  G. Karypis,et al.  Frequent sub-structure-based approaches for classifying chemical compounds , 2005, Third IEEE International Conference on Data Mining.

[12]  Hisashi Kashima,et al.  Marginalized Kernels Between Labeled Graphs , 2003, ICML.

[13]  Thomas Gärtner,et al.  A survey of kernels for structured data , 2003, SKDD.

[14]  Thomas Gärtner,et al.  On Graph Kernels: Hardness Results and Efficient Alternatives , 2003, COLT.

[15]  Dan Roth,et al.  On Kernel Methods for Relational Learning , 2003, ICML.

[16]  B. Rost,et al.  Better prediction of sub‐cellular localization by combining evolutionary and structural information , 2003, Proteins.

[17]  Thomas Gärtner,et al.  Cyclic pattern kernels for predictive graph mining , 2004, KDD.

[18]  Tatsuya Akutsu,et al.  Extensions of marginalized graph kernels , 2004, ICML.

[19]  Mehryar Mohri,et al.  Rational Kernels: Theory and Algorithms , 2004, J. Mach. Learn. Res..

[20]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[21]  Tony Jebara,et al.  Probability Product Kernels , 2004, J. Mach. Learn. Res..

[22]  Francesca Odone,et al.  Building kernels from binary strings for image matching , 2005, IEEE Transactions on Image Processing.