Wasserstein Weisfeiler-Lehman Graph Kernels

Most graph kernels are an instance of the class of $\mathcal{R}$-Convolution kernels, which measure the similarity of objects by comparing their substructures. Despite their empirical success, most graph kernels use a naive aggregation of the final set of substructures, usually a sum or average, thereby potentially discarding valuable information about the distribution of individual components. Furthermore, only a limited instance of these approaches can be extended to continuously attributed graphs. We propose a novel method that relies on the Wasserstein distance between the node feature vector distributions of two graphs, which allows to find subtler differences in data sets by considering graphs as high-dimensional objects, rather than simple means. We further propose a Weisfeiler-Lehman inspired embedding scheme for graphs with continuous node attributes and weighted edges, enhance it with the computed Wasserstein distance, and thus improve the state-of-the-art prediction performance on several graph classification tasks.

[1]  Hans-Peter Kriegel,et al.  Protein function prediction via graph kernels , 2005, ISMB.

[2]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[3]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[4]  Rastko R. Selmic,et al.  On the Definiteness of Earth Mover’s Distance and Its Relation to Set Intersection , 2015, IEEE Transactions on Cybernetics.

[5]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[6]  Kurt Mehlhorn,et al.  Weisfeiler-Lehman Graph Kernels , 2011, J. Mach. Learn. Res..

[7]  Jean-Philippe Vert,et al.  The optimal assignment kernel is not positive definite , 2008, ArXiv.

[8]  Julien Rabin,et al.  Wasserstein Barycenter and Its Application to Texture Mixing , 2011, SSVM.

[9]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[10]  Yang Zou,et al.  Sliced Wasserstein Kernels for Probability Distributions , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Hisashi Kashima,et al.  Marginalized Kernels Between Labeled Graphs , 2003, ICML.

[12]  C. Villani Optimal Transport: Old and New , 2008 .

[13]  Bernhard Schölkopf,et al.  The Kernel Trick for Distances , 2000, NIPS.

[14]  Nils M. Kriege,et al.  On Valid Optimal Assignment Kernels and Applications to Graph Classification , 2016, NIPS.

[15]  Thomas Gärtner,et al.  Learning in Reproducing Kernel Krein Spaces , 2018, ICML.

[16]  S. V. N. Vishwanathan,et al.  Graph kernels , 2007 .

[17]  Pinar Yanardag,et al.  Deep Graph Kernels , 2015, KDD.

[18]  Søren Hauberg,et al.  Geodesic exponential kernels: When curvature and linearity conflict , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Cédric Villani,et al.  Optimal Transport and Curvature , 2011 .

[20]  Nils M. Kriege,et al.  Subgraph Matching Kernels for Attributed Graphs , 2012, ICML.

[21]  Claus Bahlmann,et al.  Learning with Distance Substitution Kernels , 2004, DAGM-Symposium.

[22]  Sayan Mukherjee,et al.  Fréchet Means for Distributions of Persistence Diagrams , 2012, Discrete & Computational Geometry.

[23]  Roman Garnett,et al.  Propagation kernels: efficient graph kernels from propagated information , 2015, Machine Learning.

[24]  Alán Aspuru-Guzik,et al.  Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.

[25]  David Haussler,et al.  Convolution kernels on discrete structures , 1999 .

[26]  Karsten M. Borgwardt,et al.  A Persistent Weisfeiler-Lehman Procedure for Graph Classification , 2019, ICML.

[27]  Kristian Kersting,et al.  Faster Kernels for Graphs with Continuous Attributes via Hashing , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[28]  Karsten M. Borgwardt,et al.  Fast subtree kernels on graphs , 2009, NIPS.

[29]  S S Stevens,et al.  On the Theory of Scales of Measurement. , 1946, Science.

[30]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[31]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[32]  Marco Cuturi,et al.  Computational Optimal Transport , 2019 .

[33]  Gabriel Peyré,et al.  Computational Optimal Transport , 2018, Found. Trends Mach. Learn..

[34]  C. Berg,et al.  Harmonic Analysis on Semigroups , 1984 .

[35]  Hongyuan Zha,et al.  Gromov-Wasserstein Learning for Graph Matching and Node Embedding , 2019, ICML.

[36]  Jason Altschuler,et al.  Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration , 2017, NIPS.

[37]  Hossein Mobahi,et al.  Learning with a Wasserstein Loss , 2015, NIPS.

[38]  C. Berg,et al.  Harmonic Analysis on Semigroups: Theory of Positive Definite and Related Functions , 1984 .

[39]  Marleen de Bruijne,et al.  Scalable kernels for graphs with continuous attributes , 2013, NIPS.

[40]  Stephan Günnemann,et al.  Predict then Propagate: Combining neural networks with personalized pagerank for classification on graphs , 2018, ICLR 2018.

[41]  Maria-Florina Balcan,et al.  On a theory of learning with similarity functions , 2006, ICML.

[42]  Alexander J. Smola,et al.  Learning with non-positive kernels , 2004, ICML.

[43]  Léon Bottou,et al.  Wasserstein GAN , 2017, ArXiv.

[44]  Shin-Ichi Ohta,et al.  Barycenters in Alexandrov spaces of curvature bounded below , 2012 .

[45]  Hans-Peter Kriegel,et al.  Shortest-path kernels on graphs , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[46]  Cheng Soon Ong,et al.  Learning SVM in Kreĭn Spaces , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Andreas Zell,et al.  Optimal assignment kernels for attributed molecular graphs , 2005, ICML.

[48]  M. Bridson,et al.  Metric Spaces of Non-Positive Curvature , 1999 .

[49]  Karsten M. Borgwardt,et al.  graphkernels: R and Python packages for graph comparison , 2017, Bioinform..