Gradient descent algorithms for Bures-Wasserstein barycenters

We study first order methods to compute the barycenter of a probability distribution $P$ over the space of probability measures with finite second moment. We develop a framework to derive global rates of convergence for both gradient descent and stochastic gradient descent despite the fact that the barycenter functional is not geodesically convex. Our analysis overcomes this technical hurdle by employing a Polyak-Lojasiewicz (PL) inequality and relies on tools from optimal transport and metric geometry. In turn, we establish a PL inequality when $P$ is supported on the Bures-Wasserstein manifold of Gaussian probability measures. It leads to the first global rates of convergence for first order methods in this context.

[1]  K. Modin Geometry of Matrix Decompositions Seen Through Optimal Transport and Information Geometry , 2016, 1601.01875.

[2]  Wen Huang,et al.  A Broyden Class of Quasi-Newton Methods for Riemannian Optimization , 2015, SIAM J. Optim..

[3]  Adrian S. Lewis,et al.  Convex Analysis And Nonlinear Optimization , 2000 .

[4]  Thibaut Le Gouic,et al.  Fast convergence of empirical barycenters in Alexandrov spaces and the Wasserstein space , 2019, Journal of the European Mathematical Society.

[5]  Le Gouic Thibaut,et al.  Fast convergence of empirical barycenters in Alexandrov spaces and the Wasserstein space , 2019 .

[6]  Michael I. Jordan,et al.  Fixed-Support Wasserstein Barycenters: Computational Hardness and Fast Algorithm , 2020, NeurIPS.

[7]  Arnaud Doucet,et al.  Fast Computation of Wasserstein Barycenters , 2013, ICML.

[8]  Silvere Bonnabel,et al.  Stochastic Gradient Descent on Riemannian Manifolds , 2011, IEEE Transactions on Automatic Control.

[9]  Karl-Theodor Sturm,et al.  Probability Measures on Metric Spaces of Nonpositive Curvature , 2003 .

[10]  Gabriel Peyré,et al.  Fast Optimal Transport Averaging of Neuroimaging Data , 2015, IPMI.

[11]  Filippo Santambrogio,et al.  Optimal Transport for Applied Mathematicians , 2015 .

[12]  Suvrit Sra,et al.  Nonconvex stochastic optimization on manifolds via Riemannian Frank-Wolfe methods , 2019, ArXiv.

[13]  Martial Agueh,et al.  Vers un théorème de la limite centrale dans l'espace de Wasserstein ? , 2017 .

[14]  Sébastien Bubeck,et al.  Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..

[15]  F. Santambrogio Optimal Transport for Applied Mathematicians: Calculus of Variations, PDEs, and Modeling , 2015 .

[16]  F. Otto THE GEOMETRY OF DISSIPATIVE EVOLUTION EQUATIONS: THE POROUS MEDIUM EQUATION , 2001 .

[17]  Mark W. Schmidt,et al.  Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition , 2016, ECML/PKDD.

[18]  Thibaut Le Gouic,et al.  Convergence rates for empirical barycenters in metric spaces: curvature, convexity and extendable geodesics , 2018, Probability Theory and Related Fields.

[19]  Justin Solomon,et al.  Stochastic Wasserstein Barycenters , 2018, ICML.

[20]  David B. Dunson,et al.  Scalable Bayes via Barycenter in Wasserstein Space , 2015, J. Mach. Learn. Res..

[21]  Jérémie Bigot,et al.  Upper and lower risk bounds for estimating the Wasserstein barycenter of random measures on the real line , 2018 .

[22]  G. Carlier,et al.  Matching for teams , 2010 .

[23]  Thibaut Le Gouic,et al.  On the rate of convergence of empirical barycentres in metric spaces: curvature, convexity and extendible geodesics , 2019 .

[24]  Darina Dvinskikh Stochastic Approximation versus Sample Average Approximation for population Wasserstein barycenter calculation , 2020 .

[25]  Michael I. Jordan,et al.  Averaging Stochastic Gradient Descent on Riemannian Manifolds , 2018, COLT.

[26]  Thibaut Le Gouic,et al.  Existence and consistency of Wasserstein barycenters , 2015, Probability Theory and Related Fields.

[27]  D. Bures An extension of Kakutani’s theorem on infinite product measures to the tensor product of semifinite *-algebras , 1969 .

[28]  Gabriel Peyré,et al.  Wasserstein barycentric coordinates , 2016, ACM Trans. Graph..

[29]  E. Ruh,et al.  Angular Gaussian and Cauchy estimation , 2005 .

[30]  Suvrit Sra,et al.  First-order Methods for Geodesically Convex Optimization , 2016, COLT.

[31]  Julien Rabin,et al.  Wasserstein Barycenter and Its Application to Texture Mixing , 2011, SSVM.

[32]  Guillaume Carlier,et al.  Barycenters in the Wasserstein Space , 2011, SIAM J. Math. Anal..

[33]  Felipe A. Tobar,et al.  Bayesian Learning with Wasserstein Barycenters , 2018, ESAIM: Probability and Statistics.

[34]  Alexey Kroshnin,et al.  Statistical inference for Bures–Wasserstein barycenters , 2019, The Annals of Applied Probability.

[35]  Darina Dvinskikh,et al.  On the Complexity of Approximating Wasserstein Barycenters , 2019, ICML.

[36]  Luigi Malagò,et al.  Wasserstein Riemannian geometry of Gaussian densities , 2018, Information Geometry.

[37]  Victor M. Panaretos,et al.  Fréchet means and Procrustes analysis in Wasserstein space , 2017, Bernoulli.

[38]  Suvrit Sra,et al.  Frank-Wolfe methods for geodesically convex optimization with application to the matrix geometric mean , 2017, ArXiv.

[39]  C. Villani,et al.  Ricci curvature for metric-measure spaces via optimal transport , 2004, math/0412127.

[40]  Jean-Michel Loubes,et al.  The price for fairness in a regression framework , 2020, ArXiv.

[41]  R. Bhatia,et al.  On the Bures–Wasserstein distance between positive definite matrices , 2017, Expositiones Mathematicae.

[42]  Victor M. Panaretos,et al.  Amplitude and phase variation of point processes , 2016, 1603.08691.

[43]  Suvrit Sra,et al.  Nonconvex stochastic optimization on manifolds via Riemannian Frank-Wolfe methods , 2019, ArXiv.

[44]  L. Ambrosio,et al.  Gradient Flows: In Metric Spaces and in the Space of Probability Measures , 2005 .

[45]  M. Bacák Convex Analysis and Optimization in Hadamard Spaces , 2014 .

[46]  Julien Rabin,et al.  Convex Color Image Segmentation with Optimal Transport Distances , 2015, SSVM.

[47]  Ami Wiesel,et al.  Geodesic Convexity and Covariance Estimation , 2012, IEEE Transactions on Signal Processing.

[48]  M. Knott,et al.  On a generalization of cyclic monotonicity and distances among random vectors , 1994 .

[49]  S. Guminov,et al.  Accelerated Alternating Minimization, Accelerated Sinkhorn's Algorithm and Accelerated Iterative Bregman Projections. , 2019 .

[50]  C. Villani Topics in Optimal Transportation , 2003 .

[51]  J. A. Cuesta-Albertos,et al.  A fixed-point approach to barycenters in Wasserstein space , 2015, 1511.05355.

[52]  C. Villani Optimal Transport: Old and New , 2008 .