Optimal Transport for structured data with application on graphs

This work considers the problem of computing distances between structured objects such as undirected graphs, seen as probability distributions in a specific metric space. We consider a new transportation distance (i.e. that minimizes a total cost of transporting probability masses) that unveils the geometric nature of the structured objects space. Unlike Wasserstein or Gromov-Wasserstein metrics that focus solely and respectively on features (by considering a metric in the feature space) or structure (by seeing structure as a metric space), our new distance exploits jointly both information, and is consequently called Fused Gromov-Wasserstein (FGW). After discussing its properties and computational aspects, we show results on a graph classification task, where our method outperforms both graph kernels and deep graph convolutional networks. Exploiting further on the metric properties of FGW, interesting geometric objects such as Frechet means or barycenters of graphs are illustrated and discussed in a clustering context.

[1]  Vladimir G. Kim,et al.  Entropic metric alignment for correspondence problems , 2016, ACM Trans. Graph..

[2]  Kurt Mehlhorn,et al.  Efficient graphlet kernels for large graph comparison , 2009, AISTATS.

[3]  Mathias Niepert,et al.  Learning Convolutional Neural Networks for Graphs , 2016, ICML.

[4]  Tommi S. Jaakkola,et al.  Structured Optimal Transport , 2018, AISTATS.

[5]  Gabriel Peyré,et al.  Learning Generative Models with Sinkhorn Divergences , 2017, AISTATS.

[6]  Marleen de Bruijne,et al.  Scalable kernels for graphs with continuous attributes , 2013, NIPS.

[7]  Julien Rabin,et al.  Regularized Discrete Optimal Transport , 2013, SIAM J. Imaging Sci..

[8]  Thomas Hofmann,et al.  Predicting Structured Data (Neural Information Processing) , 2007 .

[9]  George Karypis,et al.  Comparison of descriptor spaces for chemical compound retrieval and classification , 2006, Sixth International Conference on Data Mining (ICDM'06).

[10]  Thomas Gärtner,et al.  On Graph Kernels: Hardness Results and Efficient Alternatives , 2003, COLT.

[11]  Pierre Vandergheynst,et al.  Geometric Deep Learning: Going beyond Euclidean data , 2016, IEEE Signal Process. Mag..

[12]  S. V. N. Vishwanathan,et al.  Graph kernels , 2007 .

[13]  Roman Garnett,et al.  Propagation kernels: efficient graph kernels from propagated information , 2015, Machine Learning.

[14]  Facundo Mémoli,et al.  Gromov–Wasserstein Distances and the Metric Approach to Object Matching , 2011, Found. Comput. Math..

[15]  Michalis Vazirgiannis,et al.  GraKeL: A Graph Kernel Library in Python , 2018, J. Mach. Learn. Res..

[16]  Pierre Gançarski,et al.  A global averaging method for dynamic time warping, with applications to clustering , 2011, Pattern Recognit..

[17]  Marc Alexa,et al.  As-rigid-as-possible shape interpolation , 2000, SIGGRAPH.

[18]  Marco Cuturi,et al.  Soft-DTW: a Differentiable Loss Function for Time-Series , 2017, ICML.

[19]  Razvan Pascanu,et al.  Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.

[20]  Arnaud Doucet,et al.  Fast Computation of Wasserstein Barycenters , 2013, ICML.

[21]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[22]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[23]  Gabriel Peyré,et al.  Computational Optimal Transport , 2018, Found. Trends Mach. Learn..

[24]  Simon Lacoste-Julien,et al.  Convergence Rate of Frank-Wolfe for Non-Convex Objectives , 2016, ArXiv.

[25]  Pinar Yanardag,et al.  Deep Graph Kernels , 2015, KDD.

[26]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[27]  Karl-Theodor Sturm,et al.  On the geometry of metric measure spaces. II , 2006 .

[28]  Alexandre d'Aspremont,et al.  Support vector machine classification with indefinite kernels , 2007, Math. Program. Comput..

[29]  Nils M. Kriege,et al.  Recognizing Cuneiform Signs Using Graph Based Methods , 2018, COST@SDM.

[30]  Jovan Popović,et al.  Deformation transfer for triangle meshes , 2004, SIGGRAPH 2004.

[31]  Hans-Peter Kriegel,et al.  Shortest-path kernels on graphs , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[32]  W. H. Day Optimal algorithms for comparing trees with labeled leaves , 1985 .

[33]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockstructures , 2001 .

[34]  C. Villani Optimal Transport: Old and New , 2008 .

[35]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[36]  Yuchung J. Wang,et al.  Stochastic Blockmodels for Directed Graphs , 1987 .

[37]  Xavier Bresson,et al.  Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , 2016, NIPS.

[38]  Zaïd Harchaoui,et al.  Image Classification with Segmentation Graph Kernels , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Zhi-Li Zhang,et al.  Hunt For The Unique, Stable, Sparse And Fast Feature Learning On Graphs , 2017, NIPS.

[40]  Julien Rabin,et al.  Wasserstein Barycenter and Its Application to Texture Mixing , 2011, SSVM.

[41]  Guillaume Carlier,et al.  Barycenters in the Wasserstein Space , 2011, SIAM J. Math. Anal..

[42]  Ben Glocker,et al.  Distance Metric Learning Using Graph Convolutional Networks: Application to Functional Brain Networks , 2017, MICCAI.

[43]  Nicolas Courty,et al.  Optimal Transport for Domain Adaptation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  A. Debnath,et al.  Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. Correlation with molecular orbital energies and hydrophobicity. , 1991, Journal of medicinal chemistry.

[45]  Gustavo K. Rohde,et al.  A Transportation Lp Distance for Signal Analysis , 2016, ArXiv.

[46]  Michalis Vazirgiannis,et al.  Matching Node Embeddings for Graph Similarity , 2017, AAAI.

[47]  Matt J. Kusner,et al.  Supervised Word Mover's Distance , 2016, NIPS.

[48]  Nils M. Kriege,et al.  On Valid Optimal Assignment Kernels and Applications to Graph Classification , 2016, NIPS.

[49]  Justin Solomon,et al.  Parallel Streaming Wasserstein Barycenters , 2017, NIPS.

[50]  Gabriel Peyré,et al.  Gromov-Wasserstein Averaging of Kernel and Distance Matrices , 2016, ICML.

[51]  Jeffrey J. Sutherland,et al.  Spline-Fitting with a Genetic Algorithm: A Method for Developing Classification Structure-Activity Relationships , 2003, J. Chem. Inf. Comput. Sci..

[52]  Nicolas Papadakis,et al.  Regularized Optimal Transport and the Rot Mover's Distance , 2016, J. Mach. Learn. Res..

[53]  Jason Altschuler,et al.  Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration , 2017, NIPS.

[54]  Kurt Mehlhorn,et al.  Weisfeiler-Lehman Graph Kernels , 2011, J. Mach. Learn. Res..