论文信息 - On Efficient Optimal Transport: An Analysis of Greedy and Accelerated Mirror Descent Algorithms

On Efficient Optimal Transport: An Analysis of Greedy and Accelerated Mirror Descent Algorithms

We provide theoretical analyses for two algorithms that solve the regularized optimal transport (OT) problem between two discrete probability measures with at most $n$ atoms. We show that a greedy variant of the classical Sinkhorn algorithm, known as the \emph{Greenkhorn algorithm}, can be improved to $\widetilde{\mathcal{O}}(n^2\varepsilon^{-2})$, improving on the best known complexity bound of $\widetilde{\mathcal{O}}(n^2\varepsilon^{-3})$. Notably, this matches the best known complexity bound for the Sinkhorn algorithm and helps explain why the Greenkhorn algorithm can outperform the Sinkhorn algorithm in practice. Our proof technique, which is based on a primal-dual formulation and a novel upper bound for the dual solution, also leads to a new class of algorithms that we refer to as \emph{adaptive primal-dual accelerated mirror descent} (APDAMD) algorithms. We prove that the complexity of these algorithms is $\widetilde{\mathcal{O}}(n^2\sqrt{\delta}\varepsilon^{-1})$, where $\delta > 0$ refers to the inverse of the strong convexity module of Bregman divergence with respect to $\|\cdot\|_\infty$. This implies that the APDAMD algorithm is faster than the Sinkhorn and Greenkhorn algorithms in terms of $\varepsilon$. Experimental results on synthetic and real datasets demonstrate the favorable performance of the Greenkhorn and APDAMD algorithms in practice.

Michael I. Jordan | Tianyi Lin | Nhat Ho | Nhat Ho | Tianyi Lin

[1] Richard Sinkhorn. Diagonal equivalence to matrices with prescribed row and column sums. II , 1967 .

[2] Richard Sinkhorn. Diagonal equivalence to matrices with prescribed row and column sums. II , 1967 .

[3] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .

[4] C. Villani. Topics in Optimal Transportation , 2003 .

[5] Yurii Nesterov,et al. Smooth minimization of non-smooth functions , 2005, Math. Program..

[6] L. Kantorovich. On the Translocation of Masses , 2006 .

[7] Yurii Nesterov,et al. Cubic regularization of Newton method and its global performance , 2006, Math. Program..

[8] Philip A. Knight,et al. The Sinkhorn-Knopp Algorithm: Convergence and Applications , 2008, SIAM J. Matrix Anal. Appl..

[9] Bahman Kalantari,et al. On the complexity of general matrix scaling and entropy minimization via the RAS algorithm , 2007, Math. Program..

[10] Michael Werman,et al. Fast and robust Earth Mover's Distances , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[11] Marco Cuturi,et al. Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[12] X. Nguyen. Convergence of latent mixing measures in finite and infinite mixture models , 2011, 1109.3250.

[13] Arnaud Doucet,et al. Fast Computation of Wasserstein Barycenters , 2013, ICML.

[14] Yin Tat Lee,et al. Path Finding Methods for Linear Programming: Solving Linear Programs in Õ(vrank) Iterations and Faster Algorithms for Maximum Flow , 2014, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.

[15] Gabriel Peyré,et al. Iterative Bregman Projections for Regularized Transportation Problems , 2014, SIAM J. Sci. Comput..

[16] Volkan Cevher,et al. WASP: Scalable Bayes via barycenters of subset posteriors , 2015, AISTATS.

[17] Gabriel Peyré,et al. A Smoothed Dual Approach for Variational Wasserstein Problems , 2015, SIAM J. Imaging Sci..

[18] Gabriel Peyré,et al. Gromov-Wasserstein Averaging of Kernel and Distance Matrices , 2016, ICML.

[19] Gabriel Peyré,et al. Stochastic Optimization for Large-scale Optimal Transport , 2016, NIPS.

[20] Gabriel Peyré,et al. Fast Dictionary Learning with a Smoothed Wasserstein Loss , 2016, AISTATS.

[21] Jason Altschuler,et al. Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration , 2017, NIPS.

[22] Aleksander Madry,et al. Matrix Scaling and Balancing via Box Constrained Newton's Method and Interior Point Methods , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[23] Avi Wigderson,et al. Much Faster Algorithms for Matrix Scaling , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[24] Nicolas Courty,et al. Optimal Transport for Domain Adaptation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25] Léon Bottou,et al. Wasserstein Generative Adversarial Networks , 2017, ICML.

[26] Aaron C. Courville,et al. Improved Training of Wasserstein GANs , 2017, NIPS.

[27] Steve Oudot,et al. Sliced Wasserstein Kernel for Persistence Diagrams , 2017, ICML.

[28] Aaron Sidford,et al. Towards Optimal Running Times for Optimal Transport , 2018, ArXiv.

[29] David B. Dunson,et al. Scalable Bayes via Barycenter in Wasserstein Space , 2015, J. Mach. Learn. Res..

[30] Alexander Gasnikov,et al. Computational Optimal Transport: Complexity by Accelerated Gradient Descent Is Better Than by Sinkhorn's Algorithm , 2018, ICML.

[31] Bernhard Schölkopf,et al. Wasserstein Auto-Encoders , 2017, ICLR.

[32] Darina Dvinskikh,et al. Decentralize and Randomize: Faster Algorithm for Wasserstein Barycenters , 2018, NeurIPS.

[33] Robert M. Gower,et al. Stochastic algorithms for entropy-regularized optimal transport problems , 2018, AISTATS.

[34] Sanjeev Khanna,et al. Better and simpler error analysis of the Sinkhorn–Knopp algorithm for matrix scaling , 2018, Mathematical Programming.

[35] Vivien Seguy,et al. Smooth and Sparse Optimal Transport , 2017, AISTATS.

[36] Kent Quanrud,et al. Approximating optimal transport with linear programs , 2018, SOSA.

[37] S. Guminov,et al. Accelerated Alternating Minimization, Accelerated Sinkhorn's Algorithm and Accelerated Iterative Bregman Projections. , 2019 .

[38] Xin Guo,et al. Sparsemax and Relaxed Wasserstein for Topic Sparsity , 2018, WSDM.

[39] Kevin Tian,et al. A Direct tilde{O}(1/epsilon) Iteration Parallel Algorithm for Optimal Transport , 2019, NeurIPS.

[40] Michael I. Jordan,et al. Probabilistic Multilevel Clustering via Composite Transportation Distance , 2018, AISTATS.

[41] Bo Jiang,et al. A Unified Adaptive Tensor Approximation Scheme to Accelerate Composite Convex Optimization , 2020, SIAM J. Optim..