论文信息 - On the Efficiency of Sinkhorn and Greenkhorn and Their Acceleration for Optimal Transport

On the Efficiency of Sinkhorn and Greenkhorn and Their Acceleration for Optimal Transport

We present several new complexity results for the algorithms that approximately solve the optimal transport (OT) problem between two discrete probability measures with at most n atoms. First, we improve the complexity bound of a greedy variant of the Sinkhorn algorithm, known as Greenkhorn algorithm, from Õ(nε) to Õ(nε). Notably, this matches the best known complexity bound of the Sinkhorn algorithm and sheds the light to superior practical performance of the Greenkhorn algorithm. Second, we generalize an adaptive primal-dual accelerated gradient descent (APDAGD) algorithm [Dvurechensky et al., 2018] with mirror mapping φ and prove that the resulting APDAMD algorithm achieves the complexity bound of Õ(n √ δε) where δ > 0 refers to the regularity of φ. We demonstrate that the complexity bound of Õ(min{n9/4ε−1, n2ε−2}) is invalid for the APDAGD algorithm and establish a new complexity bound of Õ(nε). Moreover, we propose a deterministic accelerated Sinkhorn algorithm and prove that it achieves the complexity bound of Õ(nε) by incorporating an estimate sequence. Therefore, the accelerated Sinkhorn algorithm outperforms the Sinkhorn and Greenkhorn algorithms in terms of 1/ε and the APDAGD and accelerated alternating minimization [Guminov et al., 2021] algorithms in terms of n. Finally, we conduct experiments on synthetic data and real images with the proposed algorithms in the paper and demonstrate their efficiency via numerical results.

Michael I. Jordan | Nhat Ho | Tianyi Lin

[1] Marco Cuturi,et al. Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[2] Marco Cuturi,et al. Subspace Robust Wasserstein distances , 2019, ICML.

[3] Muni Sreenivas Pydi,et al. Adversarial Risk via Optimal Transport and Optimal Couplings , 2019, IEEE Transactions on Information Theory.

[4] Kent Quanrud,et al. Approximating optimal transport with linear programs , 2018, SOSA.

[5] L. Khachiyan,et al. ON THE COMPLEXITY OF NONNEGATIVE-MATRIX SCALING , 1996 .

[6] Yurii Nesterov,et al. Lectures on Convex Optimization , 2018 .

[7] Gabriel Peyré,et al. A Smoothed Dual Approach for Variational Wasserstein Problems , 2015, SIAM J. Imaging Sci..

[8] Aaron Sidford,et al. Towards Optimal Running Times for Optimal Transport , 2018, ArXiv.

[9] F. Bach,et al. Sharp asymptotic and finite-sample rates of convergence of empirical measures in Wasserstein distance , 2017, Bernoulli.

[10] Alexander V. Gasnikov,et al. On a Combination of Alternating Minimization and Nesterov's Momentum , 2019, ICML.

[11] David B. Dunson,et al. Scalable Bayes via Barycenter in Wasserstein Space , 2015, J. Mach. Learn. Res..

[12] Daniel Cullina,et al. Lower Bounds on Adversarial Robustness from Optimal Transport , 2019, NeurIPS.

[13] Alexander Gasnikov,et al. Computational Optimal Transport: Complexity by Accelerated Gradient Descent Is Better Than by Sinkhorn's Algorithm , 2018, ICML.

[14] Alessandro Rudi,et al. Massively scalable Sinkhorn distances via the Nyström method , 2018, NeurIPS.

[15] A. Guillin,et al. On the rate of convergence in Wasserstein distance of the empirical measure , 2013, 1312.2128.

[16] Jason Altschuler,et al. Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration , 2017, NIPS.

[17] Jonah Sherman,et al. Area-convexity, l∞ regularization, and undirected multicommodity flow , 2017, STOC.

[18] Aleksander Madry,et al. Matrix Scaling and Balancing via Box Constrained Newton's Method and Interior Point Methods , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[19] Khai Nguyen,et al. Distributional Sliced-Wasserstein and Applications to Generative Modeling , 2020, ICLR.

[20] Gabriel Peyré,et al. Gromov-Wasserstein Averaging of Kernel and Distance Matrices , 2016, ICML.

[21] Michael I. Jordan,et al. Fast Algorithms for Computational Optimal Transport and Wasserstein Barycenter , 2019, AISTATS.

[22] Philip A. Knight,et al. The Sinkhorn-Knopp Algorithm: Convergence and Applications , 2008, SIAM J. Matrix Anal. Appl..

[23] Xin Guo,et al. Sparsemax and Relaxed Wasserstein for Topic Sparsity , 2018, WSDM.

[24] Axel Munk,et al. Optimal Transport: Fast Probabilistic Approximation with Exact Solvers , 2018, J. Mach. Learn. Res..

[25] Kevin Tian,et al. A Direct tilde{O}(1/epsilon) Iteration Parallel Algorithm for Optimal Transport , 2019, NeurIPS.

[26] Bernhard Schölkopf,et al. Wasserstein Auto-Encoders , 2017, ICLR.

[27] L. Kantorovich. On the Translocation of Masses , 2006 .

[28] Gabriel Peyré,et al. Stochastic Optimization for Large-scale Optimal Transport , 2016, NIPS.

[29] Mark W. Schmidt,et al. Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[30] Avi Wigderson,et al. Much Faster Algorithms for Matrix Scaling , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[31] Nicolas Courty,et al. Optimal Transport for Domain Adaptation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32] Darina Dvinskikh,et al. Decentralize and Randomize: Faster Algorithm for Wasserstein Barycenters , 2018, NeurIPS.

[33] Peter Richtárik,et al. Accelerated, Parallel, and Proximal Coordinate Descent , 2013, SIAM J. Optim..

[34] Gabriel Peyré,et al. Computational Optimal Transport , 2018, Found. Trends Mach. Learn..

[35] Yurii Nesterov,et al. Smooth minimization of non-smooth functions , 2005, Math. Program..

[36] Lin Xiao,et al. An Accelerated Randomized Proximal Coordinate Gradient Method and its Application to Regularized Empirical Risk Minimization , 2015, SIAM J. Optim..

[37] X. Nguyen. Convergence of latent mixing measures in finite and infinite mixture models , 2011, 1109.3250.

[38] Lénaïc Chizat,et al. Supplementary Material Supplementary material for the paper : “ Faster Wasserstein Distance Estimation with the Sinkhorn Divergence , 2022 .

[39] Léon Bottou,et al. Wasserstein Generative Adversarial Networks , 2017, ICML.

[40] Richard Sinkhorn. Diagonal equivalence to matrices with prescribed row and column sums. II , 1967 .

[41] Robert M. Gower,et al. Stochastic algorithms for entropy-regularized optimal transport problems , 2018, AISTATS.

[42] Rama Chellappa,et al. Robust Optimal Transport with Applications in Generative Modeling and Domain Adaptation , 2020, NeurIPS.