Efficient graphlet kernels for large graph comparison

State-of-the-art graph kernels do not scale to large graphs with hundreds of nodes and thousands of edges. In this article we propose to compare graphs by counting graphlets, i.e., subgraphs with k nodes where k ∈ {3, 4, 5}. Exhaustive enumeration of all graphlets being prohibitively expensive, we introduce two theoretically grounded speedup schemes, one based on sampling and the second one specifically designed for bounded degree graphs. In our experimental evaluation, our novel kernels allow us to efficiently compare large graphs that cannot be tackled by existing graph kernels.

[1]  Kurt Mehlhorn,et al.  Review of algorithms and data structures: the basic toolbox by Kurt Mehlhorn and Peter Sanders , 2011, SIGA.

[2]  P. Kelly A congruence theorem for trees. , 1957 .

[3]  E. Ordentlich,et al.  Inequalities for the L1 Deviation of the Empirical Distribution , 2003 .

[4]  Natasa Przulj,et al.  Biological network comparison using graphlet degree distribution , 2007, Bioinform..

[5]  Thomas Gärtner,et al.  Cyclic pattern kernels for predictive graph mining , 2004, KDD.

[6]  Uri Alon,et al.  Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs , 2004, Bioinform..

[7]  George Karypis,et al.  Comparison of descriptor spaces for chemical compound retrieval and classification , 2006, Sixth International Conference on Data Mining (ICDM'06).

[8]  Thomas Gärtner,et al.  On Graph Kernels: Hardness Results and Efficient Alternatives , 2003, COLT.

[9]  Uri Alon,et al.  Kashtan, N., Itzkovitz, S., Milo, R. & Alon, U. Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs. Bioinformatics 20, 1746-1758 , 2004 .

[10]  H. Kashima,et al.  Kernels for graphs , 2004 .

[11]  P. Stockmeyer,et al.  On Reconstruction of Matrices , 1971 .

[12]  Hans-Peter Kriegel,et al.  Protein function prediction via graph kernels , 2005, ISMB.

[13]  Jan Ramon,et al.  Expressivity versus efficiency of graph kernels , 2003 .

[14]  Hans-Peter Kriegel,et al.  Shortest-path kernels on graphs , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[15]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[16]  Robert L. Hemminger,et al.  On reconstructing a graph , 1969 .

[17]  S. V. N. Vishwanathan,et al.  Fast Computation of Graph Kernels , 2006, NIPS.

[18]  P. Dobson,et al.  Distinguishing enzyme structures from non-enzymes without alignments. , 2003, Journal of molecular biology.

[19]  Sebastian Wernicke,et al.  A Faster Algorithm for Detecting Network Motifs , 2005, WABI.

[20]  Brendan D. McKay,et al.  Small graphs are reconstructible , 1997, Australas. J Comb..

[21]  Jiawei Han,et al.  CloseGraph: mining closed frequent graph patterns , 2003, KDD '03.

[22]  Pierre Baldi,et al.  Graph kernels for chemical informatics , 2005, Neural Networks.