Sinkhorn Barycenters with Free Support via Frank-Wolfe Algorithm

We present a novel algorithm to estimate the barycenter of arbitrary probability distributions with respect to the Sinkhorn divergence. Based on a Frank-Wolfe optimization strategy, our approach proceeds by populating the support of the barycenter incrementally, without requiring any pre-allocation. We consider discrete as well as continuous distributions, proving convergence rates of the proposed algorithm in both settings. Key elements of our analysis are a new result showing that the Sinkhorn divergence on compact domains has Lipschitz continuous gradient with respect to the Total Variation and a characterization of the sample complexity of Sinkhorn potentials. Experiments validate the effectiveness of our method in practice.

[1]  Gabriel Peyré,et al.  Iterative Bregman Projections for Regularized Transportation Problems , 2014, SIAM J. Sci. Comput..

[2]  S. Smale,et al.  Learning Theory Estimates via Integral Operators and Their Approximations , 2007 .

[3]  Roger D. Nussbaum,et al.  Entropy Minimization, Hilbert′s Projective Metric, and Scaling Integral Kernels , 1993 .

[4]  Karsten M. Borgwardt,et al.  Learning via Hilbert Space Embedding of Distributions , 2007 .

[5]  Gabriel Peyré,et al.  Sample Complexity of Sinkhorn Divergences , 2018, AISTATS.

[6]  G. Burton Sobolev Spaces , 2013 .

[7]  Kim C. Border,et al.  Infinite Dimensional Analysis: A Hitchhiker’s Guide , 1994 .

[8]  Richard Sinkhorn,et al.  Concerning nonnegative matrices and doubly stochastic matrices , 1967 .

[9]  Bas Lemmens,et al.  Nonlinear Perron-Frobenius Theory , 2012 .

[10]  L. Devroye,et al.  No Empirical Probability Measure can Converge in the Total Variation Sense for all Distributions , 1990 .

[11]  Francis R. Bach,et al.  On the Equivalence between Herding and Conditional Gradient Algorithms , 2012, ICML.

[12]  Le Song,et al.  A Hilbert Space Embedding for Distributions , 2007, Discovery Science.

[13]  J. Dunn,et al.  Conditional gradient algorithms with open loop step size rules , 1978 .

[14]  Justin Solomon,et al.  Stochastic Wasserstein Barycenters , 2018, ICML.

[15]  David B. Dunson,et al.  Scalable Bayes via Barycenter in Wasserstein Space , 2015, J. Mach. Learn. Res..

[16]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[17]  Kenji Fukumizu,et al.  Universality, Characteristic Kernels and RKHS Embedding of Measures , 2010, J. Mach. Learn. Res..

[18]  Martin Jaggi,et al.  Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[19]  Richard Sinkhorn,et al.  A Note Concerning Simultaneous Integral Equations , 1968, Canadian Journal of Mathematics.

[20]  Benjamin Recht,et al.  The alternating descent conditional gradient method for sparse inverse problems , 2015, CAMSAP.

[21]  Alain Trouvé,et al.  Interpolating between Optimal Transport and MMD using Sinkhorn Divergences , 2018, AISTATS.

[22]  V. V. Yurinskii Exponential inequalities for sums of random vectors , 1976 .

[23]  V. F. Dem'yanov,et al.  Minimization of Functionals in Normed Spaces , 1968 .

[24]  M. V. Menon REDUCTION OF A MATRIX WITH POSITIVE ELEMENTS TO A DOUBLY STOCHASTIC MATRIX , 1967 .

[25]  Leonidas J. Guibas,et al.  Wasserstein Propagation for Semi-Supervised Learning , 2014, ICML.

[26]  Cícero Nogueira dos Santos,et al.  Wasserstein Barycenter Model Ensembling , 2019, ICLR.

[27]  Benjamin Pfaff,et al.  Perturbation Analysis Of Optimization Problems , 2016 .

[28]  Gabriel Peyré,et al.  Stochastic Optimization for Large-scale Optimal Transport , 2016, NIPS.

[29]  J. Cooper SINGULAR INTEGRALS AND DIFFERENTIABILITY PROPERTIES OF FUNCTIONS , 1973 .

[30]  J. Lorenz,et al.  On the scaling of multidimensional matrices , 1989 .

[31]  Bernhard Schölkopf,et al.  Hilbert Space Embeddings and Metrics on Probability Measures , 2009, J. Mach. Learn. Res..

[32]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[33]  Gabriel Peyré,et al.  Computational Optimal Transport , 2018, Found. Trends Mach. Learn..

[34]  R. Nussbaum,et al.  Birkhoff's version of Hilbert's metric and its applications in analysis , 2013, 1304.7921.

[35]  Arnaud Doucet,et al.  Fast Computation of Wasserstein Barycenters , 2013, ICML.

[36]  K. Bredies,et al.  Inverse problems in spaces of measures , 2013 .

[37]  M. Urner Scattered Data Approximation , 2016 .

[38]  Gabriel Peyré,et al.  Learning Generative Models with Sinkhorn Divergences , 2017, AISTATS.

[39]  V. F. Dem'yanov,et al.  The Minimization of a Smooth Convex Functional on a Convex Set , 1967 .

[40]  Justin Solomon,et al.  Parallel Streaming Wasserstein Barycenters , 2017, NIPS.

[41]  Gabriel Peyré,et al.  Fast Optimal Transport Averaging of Neuroimaging Data , 2015, IPMI.

[42]  R. Nussbaum Hilbert's Projective Metric and Iterated Nonlinear Maps , 1988 .

[43]  C. Villani Optimal Transport: Old and New , 2008 .

[44]  J. Marsden,et al.  Lectures on analysis , 1969 .

[45]  François-Xavier Vialard,et al.  Scaling algorithms for unbalanced optimal transport problems , 2017, Math. Comput..

[46]  Darina Dvinskikh,et al.  Decentralize and Randomize: Faster Algorithm for Wasserstein Barycenters , 2018, NeurIPS.

[47]  I. Pinelis OPTIMUM BOUNDS FOR THE DISTRIBUTIONS OF MARTINGALES IN BANACH SPACES , 1994, 1208.2200.

[48]  Julien Rabin,et al.  Wasserstein Barycenter and Its Application to Texture Mixing , 2011, SSVM.

[49]  James Zijun Wang,et al.  Fast Discrete Distribution Clustering Using Wasserstein Barycenter With Sparse Support , 2015, IEEE Transactions on Signal Processing.

[50]  Guillaume Carlier,et al.  Barycenters in the Wasserstein Space , 2011, SIAM J. Math. Anal..

[51]  Bernhard Schölkopf,et al.  Kernel Mean Embedding of Distributions: A Review and Beyonds , 2016, Found. Trends Mach. Learn..

[52]  Fredrik Lindsten,et al.  Sequential Kernel Herding: Frank-Wolfe Optimization for Particle Filtering , 2015, AISTATS.