On the Efficiency of the Sinkhorn and Greenkhorn Algorithms and Their Acceleration for Optimal Transport

We present new complexity results for several algorithms that approximately solve the regularized optimal transport (OT) problem between two discrete probability measures with at most $n$ atoms. First, we show that a greedy variant of the classical Sinkhorn algorithm, known as the \textit{Greenkhorn} algorithm, achieves the complexity bound of $\widetilde{\mathcal{O}}(n^2\varepsilon^{-2})$, which improves the best known bound $\widetilde{\mathcal{O}}(n^2\varepsilon^{-3})$. Notably, this matches the best known complexity bound of the Sinkhorn algorithm and explains the superior performance of the Greenkhorn algorithm in practice. Furthermore, we generalize an adaptive primal-dual accelerated gradient descent (APDAGD) algorithm with mirror mapping $\phi$ and show that the resulting \textit{adaptive primal-dual accelerated mirror descent} (APDAMD) algorithm achieves the complexity bound of $\widetilde{\mathcal{O}}(n^2\sqrt{\delta}\varepsilon^{-1})$ where $\delta>0$ depends on $\phi$. We point out that an existing complexity bound for the APDAGD algorithm is not valid in general using a simple counterexample and then establish the complexity bound of $\widetilde{\mathcal{O}}(n^{5/2}\varepsilon^{-1})$ by exploiting the connection between the APDAMD and APDAGD algorithms. Moreover, we introduce accelerated Sinkhorn and Greenkhorn algorithms that achieve the complexity bound of $\widetilde{\mathcal{O}}(n^{7/3}\varepsilon^{-1})$, which improves on the complexity bounds $\widetilde{\mathcal{O}}(n^2\varepsilon^{-2})$ of Sinkhorn and Greenkhorn algorithms in terms of $\varepsilon$. Experimental results on synthetic and real datasets demonstrate the favorable performance of new algorithms in practice.

[1]  Yurii Nesterov,et al.  Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[2]  Robert M. Gower,et al.  Stochastic algorithms for entropy-regularized optimal transport problems , 2018, AISTATS.

[3]  Stephen J. Wright Coordinate descent algorithms , 2015, Mathematical Programming.

[4]  Michael I. Jordan,et al.  Accelerated Primal-Dual Coordinate Descent for Computational Optimal Transport , 2019, ArXiv.

[5]  Bernhard Schölkopf,et al.  Wasserstein Auto-Encoders , 2017, ICLR.

[6]  Sanjeev Khanna,et al.  Better and simpler error analysis of the Sinkhorn–Knopp algorithm for matrix scaling , 2018, Mathematical Programming.

[7]  A. Guillin,et al.  On the rate of convergence in Wasserstein distance of the empirical measure , 2013, 1312.2128.

[8]  C. Villani Optimal Transport: Old and New , 2008 .

[9]  R. Dudley The Speed of Mean Glivenko-Cantelli Convergence , 1969 .

[10]  Xin Guo,et al.  Sparsemax and Relaxed Wasserstein for Topic Sparsity , 2018, WSDM.

[11]  Vivien Seguy,et al.  Smooth and Sparse Optimal Transport , 2017, AISTATS.

[12]  Kevin Tian,et al.  A Direct tilde{O}(1/epsilon) Iteration Parallel Algorithm for Optimal Transport , 2019, NeurIPS.

[13]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[14]  Alexander Gasnikov,et al.  Computational Optimal Transport: Complexity by Accelerated Gradient Descent Is Better Than by Sinkhorn's Algorithm , 2018, ICML.

[15]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[16]  Roland Badeau,et al.  Generalized Sliced Wasserstein Distances , 2019, NeurIPS.

[17]  Kent Quanrud,et al.  Approximating optimal transport with linear programs , 2018, SOSA.

[18]  Gabriel Peyré,et al.  A Smoothed Dual Approach for Variational Wasserstein Problems , 2015, SIAM J. Imaging Sci..

[19]  Amir Beck,et al.  On the Convergence of Block Coordinate Descent Type Methods , 2013, SIAM J. Optim..

[20]  Gabriel Peyré,et al.  Stochastic Optimization for Large-scale Optimal Transport , 2016, NIPS.

[21]  Aleksander Madry,et al.  Matrix Scaling and Balancing via Box Constrained Newton's Method and Interior Point Methods , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[22]  Arnaud Doucet,et al.  Fast Computation of Wasserstein Barycenters , 2013, ICML.

[23]  L. Kantorovich On the Translocation of Masses , 2006 .

[24]  Alessandro Rudi,et al.  Massively scalable Sinkhorn distances via the Nyström method , 2018, NeurIPS.

[25]  Vahab S. Mirrokni,et al.  Accelerating Greedy Coordinate Descent Methods , 2018, ICML.

[26]  Avi Wigderson,et al.  Much Faster Algorithms for Matrix Scaling , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[27]  Darina Dvinskikh,et al.  Decentralize and Randomize: Faster Algorithm for Wasserstein Barycenters , 2018, NeurIPS.

[28]  Lin Xiao,et al.  On the complexity analysis of randomized block-coordinate descent methods , 2013, Mathematical Programming.

[29]  L. Khachiyan,et al.  ON THE COMPLEXITY OF NONNEGATIVE-MATRIX SCALING , 1996 .

[30]  Jonah Sherman,et al.  Area-convexity, l∞ regularization, and undirected multicommodity flow , 2017, STOC.

[31]  Richard Sinkhorn Diagonal equivalence to matrices with prescribed row and column sums. II , 1967 .

[32]  Gabriel Peyré,et al.  Fast Dictionary Learning with a Smoothed Wasserstein Loss , 2016, AISTATS.

[33]  Marco Cuturi,et al.  Subspace Robust Wasserstein distances , 2019, ICML.

[34]  Alexander Gasnikov,et al.  Accelerated Alternating Minimization , 2019, ArXiv.

[35]  Aaron Sidford,et al.  Towards Optimal Running Times for Optimal Transport , 2018, ArXiv.

[36]  C. Villani Topics in Optimal Transportation , 2003 .

[37]  Nicolas Courty,et al.  Optimal Transport for Domain Adaptation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Kevin Tian,et al.  A Direct Õ(1/ε) Iteration Parallel Algorithm for Optimal Transport , 2019, ArXiv.

[39]  David B. Dunson,et al.  Scalable Bayes via Barycenter in Wasserstein Space , 2015, J. Mach. Learn. Res..

[40]  Steve Oudot,et al.  Sliced Wasserstein Kernel for Persistence Diagrams , 2017, ICML.

[41]  Nathaniel Lahn,et al.  A Graph Theoretic Additive Approximation of Optimal Transport , 2019, NeurIPS.

[42]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[43]  Alessandro Rudi,et al.  Approximating the Quadratic Transportation Metric in Near-Linear Time , 2018, ArXiv.

[44]  Julien Rabin,et al.  Sliced and Radon Wasserstein Barycenters of Measures , 2014, Journal of Mathematical Imaging and Vision.

[45]  Gabriel Peyré,et al.  Computational Optimal Transport , 2018, Found. Trends Mach. Learn..

[46]  Axel Munk,et al.  Optimal Transport: Fast Probabilistic Approximation with Exact Solvers , 2018, J. Mach. Learn. Res..

[47]  Peter Richtárik,et al.  Accelerated, Parallel, and Proximal Coordinate Descent , 2013, SIAM J. Optim..

[48]  Gabriel Peyré,et al.  Gromov-Wasserstein Averaging of Kernel and Distance Matrices , 2016, ICML.

[49]  Bahman Kalantari,et al.  On the complexity of general matrix scaling and entropy minimization via the RAS algorithm , 2007, Math. Program..

[50]  Michael I. Jordan,et al.  Probabilistic Multilevel Clustering via Composite Transportation Distance , 2018, AISTATS.

[51]  Philip A. Knight,et al.  The Sinkhorn-Knopp Algorithm: Convergence and Applications , 2008, SIAM J. Matrix Anal. Appl..

[52]  Lin Xiao,et al.  An Accelerated Randomized Proximal Coordinate Gradient Method and its Application to Regularized Empirical Risk Minimization , 2015, SIAM J. Optim..

[53]  Jason Altschuler,et al.  Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration , 2017, NIPS.