Learning with Similarity Functions on Graphs using Matchings of Geometric Embeddings

We develop and apply the Balcan-Blum-Srebro (BBS) theory of classification via similarity functions (which are not necessarily kernels) to the problem of graph classification. First we place the BBS theory into the unifying framework of optimal transport theory. This also opens the way to exploit coupling methods for establishing properties required of a good similarity function as per their definition. Next, we use the approach to the problem of graph classification via geometric embeddings such as the Laplacian, pseudo-inverse Laplacian and the Lovász orthogonal labellings. We consider the similarity function given by optimal and near--optimal matchings with respect to Euclidean distance of the corresponding embeddings of the graphs in high dimensions. We use optimal couplings to rigorously establish that this yields a "good" similarity measure in the BBS sense for two well known families of graphs. Further, we show that the similarity yields better classification accuracy in practice, on these families, than matchings of other well-known graph embeddings. Finally we perform an extensive empirical evaluation on benchmark data sets where we show that classifying graphs using matchings of geometric embeddings outperforms the previous state-of-the-art methods.

[1]  Liwei Wang,et al.  On learning with dissimilarity functions , 2007, ICML '07.

[2]  Devdatt P. Dubhashi,et al.  Entity disambiguation in anonymized graphs using graph kernels , 2013, CIKM.

[3]  Marc Sebban,et al.  Similarity Learning for Provably Accurate Sparse Linear Classification , 2012, ICML.

[4]  David Haussler,et al.  Convolution kernels on discrete structures , 1999 .

[5]  M. Hilario,et al.  Matching Based Kernels for Labeled Graphs , 2006 .

[6]  T. Lindvall Lectures on the Coupling Method , 1992 .

[7]  Karsten M. Borgwardt,et al.  Fast subtree kernels on graphs , 2009, NIPS.

[8]  Hans-Peter Kriegel,et al.  Shortest-path kernels on graphs , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[9]  Jean-Philippe Vert,et al.  The optimal assignment kernel is not positive definite , 2008, ArXiv.

[10]  Michel X. Goemans,et al.  Semideenite Programming in Combinatorial Optimization , 1999 .

[11]  Hans-Peter Kriegel,et al.  Protein function prediction via graph kernels , 2005, ISMB.

[12]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[13]  Melanie Hilario,et al.  Adaptive Matching Based Kernels for Labelled Graphs , 2010, PAKDD.

[14]  László Lovász,et al.  On the Shannon capacity of a graph , 1979, IEEE Trans. Inf. Theory.

[15]  Andreas Zell,et al.  Optimal assignment kernels for attributed molecular graphs , 2005, ICML.

[16]  Thomas Gärtner,et al.  A survey of kernels for structured data , 2003, SKDD.

[17]  Seth Pettie,et al.  Linear-Time Approximation for Maximum Weight Matching , 2014, JACM.

[18]  M. Ledoux The concentration of measure phenomenon , 2001 .

[19]  George Karypis,et al.  Comparison of descriptor spaces for chemical compound retrieval and classification , 2006, Sixth International Conference on Data Mining (ICDM'06).

[20]  H. Thorisson Coupling, stationarity, and regeneration , 2000 .

[21]  Pankaj K. Agarwal,et al.  Approximation algorithms for bipartite matching with metric and geometric costs , 2014, STOC.

[22]  U. Feige,et al.  Finding and certifying a large hidden clique in a semirandom graph , 2000 .

[23]  I. Gutman,et al.  Generalized inverse of the Laplacian matrix and some applications , 2004 .

[24]  Alessandro Panconesi,et al.  Concentration of Measure for the Analysis of Randomized Algorithms , 2009 .

[25]  S. V. N. Vishwanathan,et al.  Graph kernels , 2007 .

[26]  Ashwin Srinivasan,et al.  The Predictive Toxicology Challenge 2000-2001 , 2001, Bioinform..

[27]  Prateek Jain,et al.  Similarity-based Learning via Data Driven Embeddings , 2011, NIPS.

[28]  Prateek Jain,et al.  Supervised Learning with Similarity Functions , 2012, NIPS.

[29]  Robert P. W. Duin,et al.  Dissimilarity representations allow for building good classifiers , 2002, Pattern Recognit. Lett..

[30]  Kurt Mehlhorn,et al.  Weisfeiler-Lehman Graph Kernels , 2011, J. Mach. Learn. Res..

[31]  Rajiv Raman,et al.  An SDP Primal-Dual Algorithm for Approximating the Lovász-Theta Function , 2009, 2009 IEEE International Symposium on Information Theory.

[32]  Peter Sanders,et al.  A simpler linear time 2/3-epsilon approximation for maximum weight matching , 2004, Inf. Process. Lett..

[33]  Alexander Schrijver,et al.  A Convex Quadratic Characterization of the Lovász Theta Number , 2005, SIAM J. Discret. Math..

[34]  Devdatt P. Dubhashi,et al.  Global graph kernels using geometric embeddings , 2014, ICML.

[35]  Thomas Gärtner,et al.  On Graph Kernels: Hardness Results and Efficient Alternatives , 2003, COLT.

[36]  A. Debnath,et al.  Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. Correlation with molecular orbital energies and hydrophobicity. , 1991, Journal of medicinal chemistry.

[37]  Kurt Mehlhorn,et al.  Efficient graphlet kernels for large graph comparison , 2009, AISTATS.

[38]  Maria-Florina Balcan,et al.  A theory of learning with similarity functions , 2008, Machine Learning.

[39]  Tony Jebara,et al.  Structure preserving embedding , 2009, ICML '09.

[40]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[41]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[42]  Snigdhansu Chatterjee,et al.  Procrustes Problems , 2005, Technometrics.

[43]  Marleen de Bruijne,et al.  Scalable kernels for graphs with continuous attributes , 2013, NIPS.

[44]  C. Villani Topics in Optimal Transportation , 2003 .