On the Efficiency of Sinkhorn and Greenkhorn and Their Acceleration for Optimal Transport

We present several new complexity results for the algorithms that approximately solve the optimal transport (OT) problem between two discrete probability measures with at most n atoms. First, we improve the complexity bound of a greedy variant of the Sinkhorn algorithm, known as Greenkhorn algorithm, from Õ(nε) to Õ(nε). Notably, this matches the best known complexity bound of the Sinkhorn algorithm and sheds the light to superior practical performance of the Greenkhorn algorithm. Second, we generalize an adaptive primal-dual accelerated gradient descent (APDAGD) algorithm [Dvurechensky et al., 2018] with mirror mapping φ and prove that the resulting APDAMD algorithm achieves the complexity bound of Õ(n √ δε) where δ > 0 refers to the regularity of φ. We demonstrate that the complexity bound of Õ(min{n9/4ε−1, n2ε−2}) is invalid for the APDAGD algorithm and establish a new complexity bound of Õ(nε). Moreover, we propose a deterministic accelerated Sinkhorn algorithm and prove that it achieves the complexity bound of Õ(nε) by incorporating an estimate sequence. Therefore, the accelerated Sinkhorn algorithm outperforms the Sinkhorn and Greenkhorn algorithms in terms of 1/ε and the APDAGD and accelerated alternating minimization [Guminov et al., 2021] algorithms in terms of n. Finally, we conduct experiments on synthetic data and real images with the proposed algorithms in the paper and demonstrate their efficiency via numerical results.

[1]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[2]  Marco Cuturi,et al.  Subspace Robust Wasserstein distances , 2019, ICML.

[3]  Muni Sreenivas Pydi,et al.  Adversarial Risk via Optimal Transport and Optimal Couplings , 2019, IEEE Transactions on Information Theory.

[4]  Kent Quanrud,et al.  Approximating optimal transport with linear programs , 2018, SOSA.

[5]  L. Khachiyan,et al.  ON THE COMPLEXITY OF NONNEGATIVE-MATRIX SCALING , 1996 .

[6]  Yurii Nesterov,et al.  Lectures on Convex Optimization , 2018 .

[7]  Gabriel Peyré,et al.  A Smoothed Dual Approach for Variational Wasserstein Problems , 2015, SIAM J. Imaging Sci..

[8]  Aaron Sidford,et al.  Towards Optimal Running Times for Optimal Transport , 2018, ArXiv.

[9]  F. Bach,et al.  Sharp asymptotic and finite-sample rates of convergence of empirical measures in Wasserstein distance , 2017, Bernoulli.

[10]  Alexander V. Gasnikov,et al.  On a Combination of Alternating Minimization and Nesterov's Momentum , 2019, ICML.

[11]  David B. Dunson,et al.  Scalable Bayes via Barycenter in Wasserstein Space , 2015, J. Mach. Learn. Res..

[12]  Daniel Cullina,et al.  Lower Bounds on Adversarial Robustness from Optimal Transport , 2019, NeurIPS.

[13]  Alexander Gasnikov,et al.  Computational Optimal Transport: Complexity by Accelerated Gradient Descent Is Better Than by Sinkhorn's Algorithm , 2018, ICML.

[14]  Alessandro Rudi,et al.  Massively scalable Sinkhorn distances via the Nyström method , 2018, NeurIPS.

[15]  A. Guillin,et al.  On the rate of convergence in Wasserstein distance of the empirical measure , 2013, 1312.2128.

[16]  Jason Altschuler,et al.  Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration , 2017, NIPS.

[17]  Jonah Sherman,et al.  Area-convexity, l∞ regularization, and undirected multicommodity flow , 2017, STOC.

[18]  Aleksander Madry,et al.  Matrix Scaling and Balancing via Box Constrained Newton's Method and Interior Point Methods , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[19]  Khai Nguyen,et al.  Distributional Sliced-Wasserstein and Applications to Generative Modeling , 2020, ICLR.

[20]  Gabriel Peyré,et al.  Gromov-Wasserstein Averaging of Kernel and Distance Matrices , 2016, ICML.

[21]  Michael I. Jordan,et al.  Fast Algorithms for Computational Optimal Transport and Wasserstein Barycenter , 2019, AISTATS.

[22]  Philip A. Knight,et al.  The Sinkhorn-Knopp Algorithm: Convergence and Applications , 2008, SIAM J. Matrix Anal. Appl..

[23]  Xin Guo,et al.  Sparsemax and Relaxed Wasserstein for Topic Sparsity , 2018, WSDM.

[24]  Axel Munk,et al.  Optimal Transport: Fast Probabilistic Approximation with Exact Solvers , 2018, J. Mach. Learn. Res..

[25]  Kevin Tian,et al.  A Direct tilde{O}(1/epsilon) Iteration Parallel Algorithm for Optimal Transport , 2019, NeurIPS.

[26]  Bernhard Schölkopf,et al.  Wasserstein Auto-Encoders , 2017, ICLR.

[27]  L. Kantorovich On the Translocation of Masses , 2006 .

[28]  Gabriel Peyré,et al.  Stochastic Optimization for Large-scale Optimal Transport , 2016, NIPS.

[29]  Mark W. Schmidt,et al.  Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[30]  Avi Wigderson,et al.  Much Faster Algorithms for Matrix Scaling , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[31]  Nicolas Courty,et al.  Optimal Transport for Domain Adaptation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Darina Dvinskikh,et al.  Decentralize and Randomize: Faster Algorithm for Wasserstein Barycenters , 2018, NeurIPS.

[33]  Peter Richtárik,et al.  Accelerated, Parallel, and Proximal Coordinate Descent , 2013, SIAM J. Optim..

[34]  Gabriel Peyré,et al.  Computational Optimal Transport , 2018, Found. Trends Mach. Learn..

[35]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[36]  Lin Xiao,et al.  An Accelerated Randomized Proximal Coordinate Gradient Method and its Application to Regularized Empirical Risk Minimization , 2015, SIAM J. Optim..

[37]  X. Nguyen Convergence of latent mixing measures in finite and infinite mixture models , 2011, 1109.3250.

[38]  Lénaïc Chizat,et al.  Supplementary Material Supplementary material for the paper : “ Faster Wasserstein Distance Estimation with the Sinkhorn Divergence , 2022 .

[39]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[40]  Richard Sinkhorn Diagonal equivalence to matrices with prescribed row and column sums. II , 1967 .

[41]  Robert M. Gower,et al.  Stochastic algorithms for entropy-regularized optimal transport problems , 2018, AISTATS.

[42]  Rama Chellappa,et al.  Robust Optimal Transport with Applications in Generative Modeling and Domain Adaptation , 2020, NeurIPS.

[43]  Bahman Kalantari,et al.  On the complexity of general matrix scaling and entropy minimization via the RAS algorithm , 2007, Math. Program..

[44]  Michael I. Jordan,et al.  Probabilistic Multilevel Clustering via Composite Transportation Distance , 2018, AISTATS.

[45]  Sanjeev Khanna,et al.  Better and simpler error analysis of the Sinkhorn–Knopp algorithm for matrix scaling , 2018, Mathematical Programming.

[46]  Alessandro Rudi,et al.  Approximating the Quadratic Transportation Metric in Near-Linear Time , 2018, ArXiv.

[47]  Arnaud Doucet,et al.  Fast Computation of Wasserstein Barycenters , 2013, ICML.

[48]  C. Villani Optimal Transport: Old and New , 2008 .

[49]  Nathaniel Lahn,et al.  A Graph Theoretic Additive Approximation of Optimal Transport , 2019, NeurIPS.

[50]  Roland Badeau,et al.  Generalized Sliced Wasserstein Distances , 2019, NeurIPS.

[51]  Volkan Cevher,et al.  WASP: Scalable Bayes via barycenters of subset posteriors , 2015, AISTATS.

[52]  R. Dudley The Speed of Mean Glivenko-Cantelli Convergence , 1969 .

[53]  Gabriel Peyré,et al.  Sample Complexity of Sinkhorn Divergences , 2018, AISTATS.

[54]  Julien Rabin,et al.  Sliced and Radon Wasserstein Barycenters of Measures , 2014, Journal of Mathematical Imaging and Vision.

[55]  Jonathan Weed,et al.  Statistical bounds for entropic optimal transport: sample complexity and the central limit theorem , 2019, NeurIPS.

[56]  Gabriel Peyré,et al.  Fast Dictionary Learning with a Smoothed Wasserstein Loss , 2016, AISTATS.

[57]  Yin Tat Lee,et al.  Path Finding Methods for Linear Programming: Solving Linear Programs in Õ(vrank) Iterations and Faster Algorithms for Maximum Flow , 2014, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.

[58]  Vivien Seguy,et al.  Smooth and Sparse Optimal Transport , 2017, AISTATS.

[59]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[60]  Steve Oudot,et al.  Sliced Wasserstein Kernel for Persistence Diagrams , 2017, ICML.

[61]  Michael I. Jordan,et al.  On Efficient Optimal Transport: An Analysis of Greedy and Accelerated Mirror Descent Algorithms , 2019, ICML.

[62]  Vahab S. Mirrokni,et al.  Accelerating Greedy Coordinate Descent Methods , 2018, ICML.