Shortest-path kernels on graphs

Data mining algorithms are facing the challenge to deal with an increasing number of complex objects. For graph data, a whole toolbox of data mining algorithms becomes available by defining a kernel function on instances of graphs. Graph kernels based on walks, subtrees and cycles in graphs have been proposed so far. As a general problem, these kernels are either computationally expensive or limited in their expressiveness. We try to overcome this problem by defining expressive graph kernels which are based on paths. As the computation of all paths and longest paths in a graph is NP-hard, we propose graph kernels based on shortest paths. These kernels are computable in polynomial time, retain expressivity and are still positive definite. In experiments on classification of graph models of proteins, our shortest-path kernels show significantly higher classification accuracy than walk-based kernels.

[1]  Asa Ben-Hur,et al.  A Support Vector Method for Hierarchical Clustering , 2007 .

[2]  Hans-Peter Kriegel,et al.  Protein function prediction via graph kernels , 2005, ISMB.

[3]  Jan Ramon,et al.  Expressivity versus efficiency of graph kernels , 2003 .

[4]  J. Y. Yen,et al.  Finding the K Shortest Loopless Paths in a Network , 2007 .

[5]  Thomas Gärtner,et al.  Cyclic pattern kernels for predictive graph mining , 2004, KDD.

[6]  Alexander J. Smola,et al.  Support Vector Regression Machines , 1996, NIPS.

[7]  E. Lawler A PROCEDURE FOR COMPUTING THE K BEST SOLUTIONS TO DISCRETE OPTIMIZATION PROBLEMS AND ITS APPLICATION TO THE SHORTEST PATH PROBLEM , 1972 .

[8]  Hisashi Kashima,et al.  Marginalized Kernels Between Labeled Graphs , 2003, ICML.

[9]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, ICANN.

[10]  Dieter Jungnickel,et al.  Graphen, Netzwerke und Algorithmen , 1987 .

[11]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[12]  Thomas Gärtner,et al.  On Graph Kernels: Hardness Results and Efficient Alternatives , 2003, COLT.

[13]  Risi Kondor,et al.  Diffusion kernels on graphs and other discrete structures , 2002, ICML 2002.

[14]  Antje Chang,et al.  BRENDA , the enzyme database : updates and major new developments , 2003 .

[15]  John D. Lafferty,et al.  Diffusion Kernels on Graphs and Other Discrete Input Spaces , 2002, ICML.

[16]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[17]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[18]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[19]  David Haussler,et al.  Convolution kernels on discrete structures , 1999 .

[20]  Hava T. Siegelmann,et al.  A support vector clustering method , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[21]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[22]  Robert E. Tarjan,et al.  Fibonacci heaps and their uses in improved network optimization algorithms , 1984, JACM.

[23]  Stephen Warshall,et al.  A Theorem on Boolean Matrices , 1962, JACM.

[24]  Tatsuya Akutsu,et al.  Extensions of marginalized graph kernels , 2004, ICML.