On the Complexity of Approximating Wasserstein Barycenters

We study the complexity of approximating the Wasserstein barycenter of m discrete measures, or histograms of size n, by contrasting two alternative approaches that use entropic regularization. The first approach is based on the Iterative Bregman Projections (IBP) algorithm for which our novel analysis gives a complexity bound proportional to mn/ε to approximate the original non-regularized barycenter. On the other hand, using an approach based on accelerated gradient descent, we obtain a complexity proportional to mn/ε. As a byproduct, we show that the regularization parameter in both approaches has to be proportional to ε, which causes instability of both algorithms when the desired accuracy is high. To overcome this issue, we propose a novel proximal-IBP algorithm, which can be seen as a proximal gradient method, which uses IBP on each iteration to make a proximal step. We also consider the question of scalability of these algorithms using approaches from distributed optimization and show that the first algorithm can be implemented in a centralized distributed setting (master/slave), while the second one is amenable to a more general decentralized distributed setting with an arbitrary network topology.

[1]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[2]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[3]  Marc Teboulle,et al.  Convergence Analysis of a Proximal-Like Minimization Algorithm Using Bregman Functions , 1993, SIAM J. Optim..

[4]  L. Kantorovich On the Translocation of Masses , 2006 .

[5]  Sivan Toledo,et al.  Support-Graph Preconditioners , 2005, SIAM J. Matrix Anal. Appl..

[6]  Guanghui Lan,et al.  iteration-complexity for cone programming , 2008 .

[7]  C. Villani Optimal Transport: Old and New , 2008 .

[8]  Asuman E. Ozdaglar,et al.  Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.

[9]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[10]  Jérémie Bigot,et al.  Consistent estimation of a population barycenter in the Wasserstein space , 2013 .

[11]  Leonidas J. Guibas,et al.  Wasserstein Propagation for Semi-Supervised Learning , 2014, ICML.

[12]  Shang-Hua Teng,et al.  Nearly-Linear Time Algorithms for Preconditioning and Solving Symmetric, Diagonally Dominant Linear Systems , 2006, SIAM J. Matrix Anal. Appl..

[13]  Arnaud Doucet,et al.  Fast Computation of Wasserstein Barycenters , 2013, ICML.

[14]  Julien Rabin,et al.  Sliced and Radon Wasserstein Barycenters of Measures , 2014, Journal of Mathematical Imaging and Vision.

[15]  E. Barrio,et al.  A statistical analysis of a deformation model with Wasserstein barycenters : estimation procedure and goodness of fit test , 2015, 1508.06465.

[16]  F. Santambrogio Optimal Transport for Applied Mathematicians: Calculus of Variations, PDEs, and Modeling , 2015 .

[17]  Matt J. Kusner,et al.  From Word Embeddings To Document Distances , 2015, ICML.

[18]  Volkan Cevher,et al.  A Universal Primal-Dual Convex Optimization Framework , 2015, NIPS.

[19]  Thibaut Le Gouic,et al.  Existence and consistency of Wasserstein barycenters , 2015, Probability Theory and Related Fields.

[20]  Gabriel Peyré,et al.  Iterative Bregman Projections for Regularized Transportation Problems , 2014, SIAM J. Sci. Comput..

[21]  Anton Rodomanov,et al.  Primal-Dual Method for Searching Equilibrium in Hierarchical Congestion Population Games , 2016, DOOR.

[22]  Gabriel Peyré,et al.  A Smoothed Dual Approach for Variational Wasserstein Problems , 2015, SIAM J. Imaging Sci..

[23]  Gabriel Peyré,et al.  Stochastic Optimization for Large-scale Optimal Transport , 2016, NIPS.

[24]  Y. Nesterov,et al.  Efficient numerical methods for entropy-linear programming problems , 2016, Computational Mathematics and Mathematical Physics.

[25]  Alexey Chernov,et al.  Fast Primal-Dual Gradient Method for Strongly Convex Minimization Problems with Linear Constraints , 2016, DOOR.

[26]  Dinh Q. Phung,et al.  Multilevel Clustering via Wasserstein Means , 2017, ICML.

[27]  Jérémie Bigot,et al.  Geodesic PCA in the Wasserstein space by Convex PCA , 2017 .

[28]  Wei Shi,et al.  Geometrically convergent distributed optimization with uncoordinated step-sizes , 2016, 2017 American Control Conference (ACC).

[29]  Jason Altschuler,et al.  Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration , 2017, NIPS.

[30]  V. Spokoiny,et al.  Construction of Non-asymptotic Confidence Sets in 2-Wasserstein Space , 2017, 1703.03658.

[31]  Jie Lu,et al.  Fenchel Dual Gradient Methods for Distributed Convex Optimization Over Time-Varying Networks , 2017, IEEE Transactions on Automatic Control.

[32]  Laurent Massoulié,et al.  Optimal Algorithms for Smooth and Strongly Convex Distributed Optimization in Networks , 2017, ICML.

[33]  P. Dvurechensky,et al.  Dual approaches to the minimization of strongly convex functionals with a simple structure under affine constraints , 2017 .

[34]  Anna Scaglione,et al.  SUCAG: Stochastic Unbiased Curvature-aided Gradient Method for Distributed Optimization , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[35]  Aaron Sidford,et al.  Towards Optimal Running Times for Optimal Transport , 2018, ArXiv.

[36]  Martin Jaggi,et al.  COLA: Decentralized Linear Learning , 2018, NeurIPS.

[37]  Angelia Nedic,et al.  Distributed Computation of Wasserstein Barycenters Over Networks , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[38]  Darina Dvinskikh,et al.  Decentralize and Randomize: Faster Algorithm for Wasserstein Barycenters , 2018, NeurIPS.

[39]  Joakim Jaldén,et al.  PANDA: A Dual Linearly Converging Method for Distributed Optimization Over Time-Varying Undirected Graphs , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[40]  Hongyuan Zha,et al.  A Fast Proximal Point Method for Computing Wasserstein Distance , 2018 .

[41]  Volkan Cevher,et al.  A Smooth Primal-Dual Optimization Framework for Nonsmooth Composite Convex Minimization , 2015, SIAM J. Optim..

[42]  Sanjeev Khanna,et al.  Better and simpler error analysis of the Sinkhorn–Knopp algorithm for matrix scaling , 2018, Mathematical Programming.

[43]  Alexander Gasnikov,et al.  Primal–dual accelerated gradient methods with small-dimensional relaxation oracle , 2018, Optim. Methods Softw..

[44]  L. Rüschendorf,et al.  On the Computation of Wasserstein Barycenters , 2019, J. Multivar. Anal..

[45]  Alexander Gasnikov,et al.  Gradient Methods for Problems with Inexact Model of the Objective , 2019, MOTOR.

[46]  Hongyuan Zha,et al.  A Fast Proximal Point Method for Computing Exact Wasserstein Distance , 2018, UAI.

[47]  Darina Dvinskikh,et al.  On the Complexity of Approximating Wasserstein Barycenter , 2019, ArXiv.

[48]  Michael I. Jordan,et al.  On Efficient Optimal Transport: An Analysis of Greedy and Accelerated Mirror Descent Algorithms , 2019, ICML.

[49]  Yi Zhou,et al.  Communication-efficient algorithms for decentralized and stochastic optimization , 2017, Mathematical Programming.

[50]  Angelia Nedi'c,et al.  Optimal Distributed Convex Optimization on Slowly Time-Varying Graphs , 2018, IEEE Transactions on Control of Network Systems.

[51]  Sergey Omelchenko,et al.  A Stable Alternative to Sinkhorn's Algorithm for Regularized Optimal Transport , 2017, MOTOR.

[52]  Angelia Nedic,et al.  A Dual Approach for Optimal Algorithms in Distributed Optimization over Networks , 2018, 2020 Information Theory and Applications Workshop (ITA).

[53]  Alexander Gasnikov,et al.  Inexact model: a framework for optimization and variational inequalities , 2019, Optim. Methods Softw..