Fast and Provably Convergent Algorithms for Gromov-Wasserstein in Graph Learning

In this paper, we study the design and analysis of a class of efficient algorithms for computing the Gromov-Wasserstein (GW) distance tailored to large-scale graph learning tasks. Armed with the Luo-Tseng error bound condition (Luo & Tseng, 1992), two proposed algorithms, called Bregman Alternating Projected Gradient (BAPG) and hybrid Bregman Proximal Gradient (hBPG) are proven to be (linearly) convergent. Upon task-specific properties, our analysis further provides novel theoretical insights to guide how to select the best fit method. As a result, we are able to provide comprehensive experiments to validate the effectiveness of our methods on a host of tasks, including graph alignment, graph partition, and shape matching. In terms of both wall-clock time and modeling performance, the proposed methods achieve state-of-the-art results.

[1]  Mikael Johansson,et al.  A Fast and Accurate Splitting Method for Optimal Transport: Analysis and Implementation , 2021, ICLR.

[2]  Marco Cuturi,et al.  Linear-Time Gromov Wasserstein Distances using Low Rank Couplings and Costs , 2021, ICML.

[3]  N. Courty,et al.  Semi-relaxed Gromov Wasserstein divergence with applications on graphs , 2021, ArXiv.

[4]  Hanghang Tong,et al.  Balancing Consistency and Disparity in Network Alignment , 2021, KDD.

[5]  Nicolas Courty,et al.  Online Graph Dictionary Learning , 2021, ICML.

[6]  Hongyuan Zha,et al.  Learning Graphons via Structured Gromov-Wasserstein Barycenters , 2020, AAAI.

[7]  Samir Chowdhury,et al.  Generalized Spectral Clustering via Gromov-Wasserstein Learning , 2020, AISTATS.

[8]  Jundong Li,et al.  Unsupervised Graph Alignment with Wasserstein Distance Discriminator , 2021, KDD.

[9]  Jiawei Zhang,et al.  A Global Dual Error Bound and Its Application to the Analysis of Linearly Constrained Nonconvex Optimization , 2020, SIAM J. Optim..

[10]  Tara Abrishami,et al.  Geometry of Graph Partitions via Optimal Transport , 2019, SIAM J. Sci. Comput..

[11]  Marco Cuturi,et al.  Computational Optimal Transport: With Applications to Data Science , 2019 .

[12]  Nicolas Courty,et al.  Sliced Gromov-Wasserstein , 2019, NeurIPS.

[13]  Lawrence Carin,et al.  Scalable Gromov-Wasserstein Learning for Graph Partitioning and Matching , 2019, NeurIPS.

[14]  Stefanie Jegelka,et al.  Learning Generative Models across Incomparable Spaces , 2019, ICML.

[15]  Hongyuan Zha,et al.  Gromov-Wasserstein Learning for Graph Matching and Node Embedding , 2019, ICML.

[16]  Samir Chowdhury,et al.  The Gromov-Wasserstein distance between networks and stable network invariants , 2018, Information and Inference: A Journal of the IMA.

[17]  Nicolas Courty,et al.  Optimal Transport for structured data with application on graphs , 2018, ICML.

[18]  Nicolas Courty,et al.  Fused Gromov-Wasserstein distance for structured objects: theoretical foundations and mathematical properties , 2018, Algorithms.

[19]  Guoyin Li,et al.  Calculus of the Exponent of Kurdyka–Łojasiewicz Inequality and Its Applications to Linear Convergence of First-Order Methods , 2016, Foundations of Computational Mathematics.

[20]  Marc Teboulle,et al.  A Descent Lemma Beyond Lipschitz Gradient Continuity: First-Order Methods Revisited and Applications , 2017, Math. Oper. Res..

[21]  Anthony Man-Cho So,et al.  A unified approach to error bounds for structured convex optimization problems , 2015, Mathematical Programming.

[22]  Vladimir G. Kim,et al.  Entropic metric alignment for correspondence problems , 2016, ACM Trans. Graph..

[23]  Simon Lacoste-Julien,et al.  Convergence Rate of Frank-Wolfe for Non-Convex Objectives , 2016, ArXiv.

[24]  Gabriel Peyré,et al.  Gromov-Wasserstein Averaging of Kernel and Distance Matrices , 2016, ICML.

[25]  Alexandre M. Bayen,et al.  Efficient Bregman projections onto the simplex , 2015, 2015 54th IEEE Conference on Decision and Control (CDC).

[26]  Gabriel Peyré,et al.  Iterative Bregman Projections for Regularized Transportation Problems , 2014, SIAM J. Sci. Comput..

[27]  Facundo Mémoli,et al.  The Gromov-Wasserstein Distance: A Brief Overview , 2014, Axioms.

[28]  Zhi-Quan Luo,et al.  On the Linear Convergence of the Proximal Gradient Method for Trace Norm Regularization , 2013, NIPS.

[29]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[30]  Martin Jaggi,et al.  Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[31]  Benar Fux Svaiter,et al.  Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods , 2013, Math. Program..

[32]  Facundo Mémoli,et al.  Gromov–Wasserstein Distances and the Metric Approach to Object Matching , 2011, Found. Comput. Math..

[33]  James Bailey,et al.  Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance , 2010, J. Mach. Learn. Res..

[34]  Hédy Attouch,et al.  Proximal Alternating Minimization and Projection Methods for Nonconvex Problems: An Approach Based on the Kurdyka-Lojasiewicz Inequality , 2008, Math. Oper. Res..

[35]  Facundo Mémoli,et al.  Spectral Gromov-Wasserstein distances for shape matching , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[36]  Panos M. Pardalos,et al.  Quadratic Assignment Problem , 1997, Encyclopedia of Optimization.

[37]  Ben Taskar,et al.  Word Alignment via Quadratic Assignment , 2006, NAACL.

[38]  Guillermo Sapiro,et al.  Comparing point clouds , 2004, SGP '04.

[39]  Heinz H. Bauschke,et al.  Dykstras algorithm with bregman projections: A convergence proof , 2000 .

[40]  Heinz H. Bauschke,et al.  Projection algorithms and monotone operators , 1996 .

[41]  Paul Tseng,et al.  Error Bound and Convergence Analysis of Matrix Splitting Algorithms for the Affine Variational Inequality Problem , 1992, SIAM J. Optim..