Estimating Barycenters of Measures in High Dimensions

Barycentric averaging is a principled way of summarizing populations of measures. Existing algorithms for estimating barycenters typically parametrize them as weighted sums of Diracs and optimize their weights and/or locations. However, these approaches do not scale to high-dimensional settings due to the curse of dimensionality. In this paper, we propose a scalable and general algorithm for estimating barycenters of measures in high dimensions. The key idea is to turn the optimization over measures into an optimization over generative models, introducing inductive biases that allow the method to scale while still accurately estimating barycenters. We prove local convergence under mild assumptions on the discrepancy showing that the approach is well-posed. We demonstrate that our method is fast, achieves good performance on low-dimensional problems, and scales to high-dimensional settings. In particular, our approach is the first to be used to estimate barycenters in thousands of dimensions.

[1]  Dmitriy Drusvyatskiy,et al.  Stochastic subgradient method converges at the rate O(k-1/4) on weakly convex functions , 2018, ArXiv.

[2]  Richard Sinkhorn Diagonal equivalence to matrices with prescribed row and column sums. II , 1974 .

[3]  Arthur Gretton,et al.  Demystifying MMD GANs , 2018, ICLR.

[4]  Gabriel Peyré,et al.  Sample Complexity of Sinkhorn Divergences , 2018, AISTATS.

[5]  Adam M. Oberman,et al.  NUMERICAL METHODS FOR MATCHING FOR TEAMS AND WASSERSTEIN BARYCENTERS , 2014, 1411.3602.

[6]  Marco Cuturi,et al.  Wasserstein regularization for sparse multi-task regression , 2018, AISTATS.

[7]  Stefanie Jegelka,et al.  Learning Generative Models across Incomparable Spaces , 2019, ICML.

[8]  Steffen Borgwardt,et al.  Discrete Wasserstein barycenters: optimal transport for discrete data , 2015, Mathematical Methods of Operations Research.

[9]  Alessandro Rudi,et al.  Differential Properties of Sinkhorn Approximation for Learning with Wasserstein Distance , 2018, NeurIPS.

[10]  Arnaud Doucet,et al.  Fast Computation of Wasserstein Barycenters , 2013, ICML.

[11]  Justin Solomon,et al.  Parallel Streaming Wasserstein Barycenters , 2017, NIPS.

[12]  Gabriel Peyré,et al.  Gromov-Wasserstein Averaging of Kernel and Distance Matrices , 2016, ICML.

[13]  Marc G. Bellemare,et al.  The Cramer Distance as a Solution to Biased Wasserstein Gradients , 2017, ArXiv.

[14]  Guillaume Carlier,et al.  Barycenters in the Wasserstein Space , 2011, SIAM J. Math. Anal..

[15]  Massimiliano Pontil,et al.  Sinkhorn Barycenters with Free Support via Frank-Wolfe Algorithm , 2019, NeurIPS.

[16]  Bernhard Schölkopf,et al.  Kernel Mean Embedding of Distributions: A Review and Beyonds , 2016, Found. Trends Mach. Learn..

[17]  Volkan Cevher,et al.  WASP: Scalable Bayes via barycenters of subset posteriors , 2015, AISTATS.

[18]  Arthur Gretton,et al.  On gradient regularizers for MMD GANs , 2018, NeurIPS.

[19]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[20]  Marco Cuturi,et al.  Computational Optimal Transport , 2019 .

[21]  Bernhard Schölkopf,et al.  Measuring Statistical Dependence with Hilbert-Schmidt Norms , 2005, ALT.

[22]  Gabriel Peyré,et al.  Computational Optimal Transport , 2018, Found. Trends Mach. Learn..

[23]  Yiming Yang,et al.  MMD GAN: Towards Deeper Understanding of Moment Matching Network , 2017, NIPS.

[24]  David Lopez-Paz,et al.  Geometrical Insights for Implicit Generative Modeling , 2017, Braverman Readings in Machine Learning.

[25]  C. Villani Optimal Transport: Old and New , 2008 .

[26]  Justin Solomon,et al.  Stochastic Wasserstein Barycenters , 2018, ICML.

[27]  Jason D. Lee,et al.  On the Convergence and Robustness of Training GANs with Regularized Optimal Transport , 2018, NeurIPS.

[28]  Gabriel Peyré,et al.  Fast Optimal Transport Averaging of Neuroimaging Data , 2015, IPMI.

[29]  Michael R. Lyu,et al.  Parallel Wasserstein Generative Adversarial Nets with Multiple Discriminators , 2019, IJCAI.

[30]  Zoubin Ghahramani,et al.  Training generative neural networks via Maximum Mean Discrepancy optimization , 2015, UAI.

[31]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[32]  R. McCann A Convexity Principle for Interacting Gases , 1997 .

[33]  Julien Rabin,et al.  Sliced and Radon Wasserstein Barycenters of Measures , 2014, Journal of Mathematical Imaging and Vision.

[34]  L. Kantorovich On the Translocation of Masses , 2006 .

[35]  Hossein Mobahi,et al.  Learning with a Wasserstein Loss , 2015, NIPS.

[36]  Richard Sinkhorn Diagonal equivalence to matrices with prescribed row and column sums. II , 1967 .

[37]  Darina Dvinskikh,et al.  Decentralize and Randomize: Faster Algorithm for Wasserstein Barycenters , 2018, NeurIPS.

[38]  Gabriel Peyré,et al.  Learning Generative Models with Sinkhorn Divergences , 2017, AISTATS.

[39]  Gabriel Peyré,et al.  Iterative Bregman Projections for Regularized Transportation Problems , 2014, SIAM J. Sci. Comput..

[40]  Tyler Maunu,et al.  Gradient descent algorithms for Bures-Wasserstein barycenters , 2020, COLT.

[41]  Cícero Nogueira dos Santos,et al.  Wasserstein Barycenter Model Ensembling , 2019, ICLR.

[42]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.