Accelerated Primal-Dual Coordinate Descent for Computational Optimal Transport

We propose and analyze a novel accelerated primal-dual coordinate descent framework for computing the optimal transport (OT) distance between two discrete probability distributions. First, we introduce the accelerated primal-dual randomized coordinate descent (APDRCD) algorithm for computing OT. Then we provide a complexity upper bound $\widetilde{\mathcal{O}}(\frac{n^{5/2}}{\varepsilon})$ for the APDRCD method for approximating OT distance, where $n$ stands for the number of atoms of these probability measures and $\varepsilon > 0$ is the desired accuracy. This upper bound matches the best known complexities of adaptive primal-dual accelerated gradient descent (APDAGD) and adaptive primal-dual accelerate mirror descent (APDAMD) algorithms while it is better than those of Sinkhorn and Greenkhorn algorithms, which are of the order $\widetilde{\mathcal{O}}(\frac{n^{2}}{\varepsilon^2})$, in terms of the desired accuracy $\varepsilon > 0$. Furthermore, we propose a greedy version of APDRCD algorithm that we refer to as the accelerated primal-dual greedy coordinate descent (APDGCD) algorithm and demonstrate that it has a better practical performance than the APDRCD algorithm. Extensive experimental studies demonstrate the favorable performance of the APDRCD and APDGCD algorithms over state-of-the-art primal-dual algorithms for OT in the literature.

[1]  X. Nguyen Convergence of latent mixing measures in finite and infinite mixture models , 2011, 1109.3250.

[2]  David B. Dunson,et al.  Scalable Bayes via Barycenter in Wasserstein Space , 2015, J. Mach. Learn. Res..

[3]  P. Rigollet,et al.  Uncoupled isotonic regression via minimum Wasserstein deconvolution , 2018, Information and Inference: A Journal of the IMA.

[4]  Jason Altschuler,et al.  Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration , 2017, NIPS.

[5]  Vahab S. Mirrokni,et al.  Accelerating Greedy Coordinate Descent Methods , 2018, ICML.

[6]  Alexander Gasnikov,et al.  Computational Optimal Transport: Complexity by Accelerated Gradient Descent Is Better Than by Sinkhorn's Algorithm , 2018, ICML.

[7]  Richard Sinkhorn Diagonal equivalence to matrices with prescribed row and column sums. II , 1967 .

[8]  Bahman Kalantari,et al.  On the complexity of general matrix scaling and entropy minimization via the RAS algorithm , 2007, Math. Program..

[9]  Darina Dvinskikh,et al.  Decentralize and Randomize: Faster Algorithm for Wasserstein Barycenters , 2018, NeurIPS.

[10]  L. Kantorovich On the Translocation of Masses , 2006 .

[11]  Gabriel Peyré,et al.  Stochastic Optimization for Large-scale Optimal Transport , 2016, NIPS.

[12]  Arnaud Doucet,et al.  Fast Computation of Wasserstein Barycenters , 2013, ICML.

[13]  Michael I. Jordan,et al.  On Efficient Optimal Transport: An Analysis of Greedy and Accelerated Mirror Descent Algorithms , 2019, ICML.

[14]  Nicolas Courty,et al.  Optimal Transport for Domain Adaptation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Yin Tat Lee,et al.  Path Finding Methods for Linear Programming: Solving Linear Programs in Õ(vrank) Iterations and Faster Algorithms for Maximum Flow , 2014, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.

[16]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[17]  Philip A. Knight,et al.  The Sinkhorn-Knopp Algorithm: Convergence and Applications , 2008, SIAM J. Matrix Anal. Appl..

[18]  Xin Guo,et al.  Sparsemax and Relaxed Wasserstein for Topic Sparsity , 2018, WSDM.

[19]  Michael I. Jordan,et al.  Probabilistic Multilevel Clustering via Composite Transportation Distance , 2018, AISTATS.

[20]  Volkan Cevher,et al.  WASP: Scalable Bayes via barycenters of subset posteriors , 2015, AISTATS.

[21]  Bernhard Schölkopf,et al.  Wasserstein Auto-Encoders , 2017, ICLR.

[22]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.