On the Efficiency of the Sinkhorn and Greenkhorn Algorithms and Their Acceleration for Optimal Transport

We present new complexity results for several algorithms that approximately solve the regularized optimal transport (OT) problem between two discrete probability measures with at most $n$ atoms. First, we show that a greedy variant of the classical Sinkhorn algorithm, known as the \textit{Greenkhorn} algorithm, achieves the complexity bound of $\widetilde{\mathcal{O}}(n^2\varepsilon^{-2})$, which improves the best known bound $\widetilde{\mathcal{O}}(n^2\varepsilon^{-3})$. Notably, this matches the best known complexity bound of the Sinkhorn algorithm and explains the superior performance of the Greenkhorn algorithm in practice. Furthermore, we generalize an adaptive primal-dual accelerated gradient descent (APDAGD) algorithm with mirror mapping $\phi$ and show that the resulting \textit{adaptive primal-dual accelerated mirror descent} (APDAMD) algorithm achieves the complexity bound of $\widetilde{\mathcal{O}}(n^2\sqrt{\delta}\varepsilon^{-1})$ where $\delta>0$ depends on $\phi$. We point out that an existing complexity bound for the APDAGD algorithm is not valid in general using a simple counterexample and then establish the complexity bound of $\widetilde{\mathcal{O}}(n^{5/2}\varepsilon^{-1})$ by exploiting the connection between the APDAMD and APDAGD algorithms. Moreover, we introduce accelerated Sinkhorn and Greenkhorn algorithms that achieve the complexity bound of $\widetilde{\mathcal{O}}(n^{7/3}\varepsilon^{-1})$, which improves on the complexity bounds $\widetilde{\mathcal{O}}(n^2\varepsilon^{-2})$ of Sinkhorn and Greenkhorn algorithms in terms of $\varepsilon$. Experimental results on synthetic and real datasets demonstrate the favorable performance of new algorithms in practice.

[1]  Avi Wigderson,et al.  Much Faster Algorithms for Matrix Scaling , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[2]  Aleksander Madry,et al.  Matrix Scaling and Balancing via Box Constrained Newton's Method and Interior Point Methods , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[3]  Xin Guo,et al.  Sparsemax and Relaxed Wasserstein for Topic Sparsity , 2018, WSDM.

[4]  Bahman Kalantari,et al.  On the complexity of general matrix scaling and entropy minimization via the RAS algorithm , 2007, Math. Program..

[5]  Darina Dvinskikh,et al.  Decentralize and Randomize: Faster Algorithm for Wasserstein Barycenters , 2018, NeurIPS.

[6]  Marco Cuturi,et al.  Subspace Robust Wasserstein distances , 2019, ICML.

[7]  Aaron Sidford,et al.  Towards Optimal Running Times for Optimal Transport , 2018, ArXiv.

[8]  David B. Dunson,et al.  Scalable Bayes via Barycenter in Wasserstein Space , 2015, J. Mach. Learn. Res..

[9]  Richard Sinkhorn Diagonal equivalence to matrices with prescribed row and column sums. II , 1974 .

[10]  Jason Altschuler,et al.  Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration , 2017, NIPS.

[11]  Amir Beck,et al.  On the Convergence of Block Coordinate Descent Type Methods , 2013, SIAM J. Optim..

[12]  A. Guillin,et al.  On the rate of convergence in Wasserstein distance of the empirical measure , 2013, 1312.2128.

[13]  Nicolas Courty,et al.  Optimal Transport for Domain Adaptation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Arnaud Doucet,et al.  Fast Computation of Wasserstein Barycenters , 2013, ICML.

[15]  Alessandro Rudi,et al.  Approximating the Quadratic Transportation Metric in Near-Linear Time , 2018, ArXiv.

[16]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[17]  Gabriel Peyré,et al.  Computational Optimal Transport , 2018, Found. Trends Mach. Learn..

[18]  Gabriel Peyré,et al.  Fast Dictionary Learning with a Smoothed Wasserstein Loss , 2016, AISTATS.

[19]  Axel Munk,et al.  Optimal Transport: Fast Probabilistic Approximation with Exact Solvers , 2018, J. Mach. Learn. Res..

[20]  Bernhard Schölkopf,et al.  Wasserstein Auto-Encoders , 2017, ICLR.

[21]  Kevin Tian,et al.  A Direct Õ(1/ε) Iteration Parallel Algorithm for Optimal Transport , 2019, ArXiv.

[22]  Michael I. Jordan,et al.  Accelerated Primal-Dual Coordinate Descent for Computational Optimal Transport , 2019, ArXiv.

[23]  Philip A. Knight,et al.  The Sinkhorn-Knopp Algorithm: Convergence and Applications , 2008, SIAM J. Matrix Anal. Appl..

[24]  Gabriel Peyré,et al.  Stochastic Optimization for Large-scale Optimal Transport , 2016, NIPS.

[25]  Kent Quanrud,et al.  Approximating optimal transport with linear programs , 2018, SOSA.

[26]  Michael I. Jordan,et al.  Probabilistic Multilevel Clustering via Composite Transportation Distance , 2018, AISTATS.

[27]  Vivien Seguy,et al.  Smooth and Sparse Optimal Transport , 2017, AISTATS.

[28]  Robert M. Gower,et al.  Stochastic algorithms for entropy-regularized optimal transport problems , 2018, AISTATS.

[29]  Alexander Gasnikov,et al.  Accelerated Alternating Minimization , 2019, ArXiv.

[30]  C. Villani Topics in Optimal Transportation , 2003 .

[31]  C. Villani Optimal Transport: Old and New , 2008 .

[32]  Alexander Gasnikov,et al.  Computational Optimal Transport: Complexity by Accelerated Gradient Descent Is Better Than by Sinkhorn's Algorithm , 2018, ICML.

[33]  R. Dudley The Speed of Mean Glivenko-Cantelli Convergence , 1969 .

[34]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[35]  Gabriel Peyré,et al.  Gromov-Wasserstein Averaging of Kernel and Distance Matrices , 2016, ICML.

[36]  Jonah Sherman,et al.  Area-convexity, l∞ regularization, and undirected multicommodity flow , 2017, STOC.

[37]  Stephen J. Wright Coordinate descent algorithms , 2015, Mathematical Programming.

[38]  Nathaniel Lahn,et al.  A Graph Theoretic Additive Approximation of Optimal Transport , 2019, NeurIPS.

[39]  L. Khachiyan,et al.  ON THE COMPLEXITY OF NONNEGATIVE-MATRIX SCALING , 1996 .

[40]  Julien Rabin,et al.  Sliced and Radon Wasserstein Barycenters of Measures , 2014, Journal of Mathematical Imaging and Vision.

[41]  Steve Oudot,et al.  Sliced Wasserstein Kernel for Persistence Diagrams , 2017, ICML.

[42]  Gabriel Peyré,et al.  A Smoothed Dual Approach for Variational Wasserstein Problems , 2015, SIAM J. Imaging Sci..

[43]  L. Kantorovich On the Translocation of Masses , 2006 .

[44]  Roland Badeau,et al.  Generalized Sliced Wasserstein Distances , 2019, NeurIPS.

[45]  Lin Xiao,et al.  On the complexity analysis of randomized block-coordinate descent methods , 2013, Mathematical Programming.

[46]  Peter Richtárik,et al.  Accelerated, Parallel, and Proximal Coordinate Descent , 2013, SIAM J. Optim..

[47]  Kevin Tian,et al.  A Direct tilde{O}(1/epsilon) Iteration Parallel Algorithm for Optimal Transport , 2019, NeurIPS.

[48]  Vahab S. Mirrokni,et al.  Accelerating Greedy Coordinate Descent Methods , 2018, ICML.

[49]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[50]  Alessandro Rudi,et al.  Massively scalable Sinkhorn distances via the Nyström method , 2018, NeurIPS.

[51]  Yurii Nesterov,et al.  Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[52]  Sanjeev Khanna,et al.  Better and simpler error analysis of the Sinkhorn–Knopp algorithm for matrix scaling , 2018, Mathematical Programming.

[53]  Lin Xiao,et al.  An Accelerated Randomized Proximal Coordinate Gradient Method and its Application to Regularized Empirical Risk Minimization , 2015, SIAM J. Optim..