论文信息 - Sinkhorn Barycenters with Free Support via Frank-Wolfe Algorithm - 字舞流文

Sinkhorn Barycenters with Free Support via Frank-Wolfe Algorithm

We present a novel algorithm to estimate the barycenter of arbitrary probability distributions with respect to the Sinkhorn divergence. Based on a Frank-Wolfe optimization strategy, our approach proceeds by populating the support of the barycenter incrementally, without requiring any pre-allocation. We consider discrete as well as continuous distributions, proving convergence rates of the proposed algorithm in both settings. Key elements of our analysis are a new result showing that the Sinkhorn divergence on compact domains has Lipschitz continuous gradient with respect to the Total Variation and a characterization of the sample complexity of Sinkhorn potentials. Experiments validate the effectiveness of our method in practice.

Massimiliano Pontil | Carlo Ciliberto | Saverio Salzo | Giulia Luise | M. Pontil | C. Ciliberto | Saverio Salzo | Giulia Luise

[1] Gabriel Peyré,et al. Iterative Bregman Projections for Regularized Transportation Problems , 2014, SIAM J. Sci. Comput..

[2] S. Smale,et al. Learning Theory Estimates via Integral Operators and Their Approximations , 2007 .

[3] Roger D. Nussbaum,et al. Entropy Minimization, Hilbert′s Projective Metric, and Scaling Integral Kernels , 1993 .

[4] Karsten M. Borgwardt,et al. Learning via Hilbert Space Embedding of Distributions , 2007 .

[5] Gabriel Peyré,et al. Sample Complexity of Sinkhorn Divergences , 2018, AISTATS.

[6] G. Burton. Sobolev Spaces , 2013 .

[7] Kim C. Border,et al. Infinite Dimensional Analysis: A Hitchhiker’s Guide , 1994 .

[8] Richard Sinkhorn,et al. Concerning nonnegative matrices and doubly stochastic matrices , 1967 .

[9] Bas Lemmens,et al. Nonlinear Perron-Frobenius Theory , 2012 .

[10] L. Devroye,et al. No Empirical Probability Measure can Converge in the Total Variation Sense for all Distributions , 1990 .

[11] Francis R. Bach,et al. On the Equivalence between Herding and Conditional Gradient Algorithms , 2012, ICML.

[12] Le Song,et al. A Hilbert Space Embedding for Distributions , 2007, Discovery Science.

[13] J. Dunn,et al. Conditional gradient algorithms with open loop step size rules , 1978 .

[14] Justin Solomon,et al. Stochastic Wasserstein Barycenters , 2018, ICML.

[15] David B. Dunson,et al. Scalable Bayes via Barycenter in Wasserstein Space , 2015, J. Mach. Learn. Res..

[16] Sergei Vassilvitskii,et al. k-means++: the advantages of careful seeding , 2007, SODA '07.

[17] Kenji Fukumizu,et al. Universality, Characteristic Kernels and RKHS Embedding of Measures , 2010, J. Mach. Learn. Res..

[18] Martin Jaggi,et al. Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[19] Richard Sinkhorn,et al. A Note Concerning Simultaneous Integral Equations , 1968, Canadian Journal of Mathematics.

[20] Benjamin Recht,et al. The alternating descent conditional gradient method for sparse inverse problems , 2015, CAMSAP.

[21] Alain Trouvé,et al. Interpolating between Optimal Transport and MMD using Sinkhorn Divergences , 2018, AISTATS.

[22] V. V. Yurinskii. Exponential inequalities for sums of random vectors , 1976 .

[23] V. F. Dem'yanov,et al. Minimization of Functionals in Normed Spaces , 1968 .

[24] M. V. Menon. REDUCTION OF A MATRIX WITH POSITIVE ELEMENTS TO A DOUBLY STOCHASTIC MATRIX , 1967 .

[25] Leonidas J. Guibas,et al. Wasserstein Propagation for Semi-Supervised Learning , 2014, ICML.

[26] Cícero Nogueira dos Santos,et al. Wasserstein Barycenter Model Ensembling , 2019, ICLR.

[27] Benjamin Pfaff,et al. Perturbation Analysis Of Optimization Problems , 2016 .

[28] Gabriel Peyré,et al. Stochastic Optimization for Large-scale Optimal Transport , 2016, NIPS.

[29] J. Cooper. SINGULAR INTEGRALS AND DIFFERENTIABILITY PROPERTIES OF FUNCTIONS , 1973 .

[30] J. Lorenz,et al. On the scaling of multidimensional matrices , 1989 .

[31] Bernhard Schölkopf,et al. Hilbert Space Embeddings and Metrics on Probability Measures , 2009, J. Mach. Learn. Res..

[32] Marco Cuturi,et al. Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[33] Gabriel Peyré,et al. Computational Optimal Transport , 2018, Found. Trends Mach. Learn..

[34] R. Nussbaum,et al. Birkhoff's version of Hilbert's metric and its applications in analysis , 2013, 1304.7921.

[35] Arnaud Doucet,et al. Fast Computation of Wasserstein Barycenters , 2013, ICML.

[36] K. Bredies,et al. Inverse problems in spaces of measures , 2013 .

[37] M. Urner. Scattered Data Approximation , 2016 .

[38] Gabriel Peyré,et al. Learning Generative Models with Sinkhorn Divergences , 2017, AISTATS.

[39] V. F. Dem'yanov,et al. The Minimization of a Smooth Convex Functional on a Convex Set , 1967 .

[40] Justin Solomon,et al. Parallel Streaming Wasserstein Barycenters , 2017, NIPS.

[41] Gabriel Peyré,et al. Fast Optimal Transport Averaging of Neuroimaging Data , 2015, IPMI.

[42] R. Nussbaum. Hilbert's Projective Metric and Iterated Nonlinear Maps , 1988 .

[43] C. Villani. Optimal Transport: Old and New , 2008 .

[44] J. Marsden,et al. Lectures on analysis , 1969 .

[45] François-Xavier Vialard,et al. Scaling algorithms for unbalanced optimal transport problems , 2017, Math. Comput..

[46] Darina Dvinskikh,et al. Decentralize and Randomize: Faster Algorithm for Wasserstein Barycenters , 2018, NeurIPS.

[47] I. Pinelis. OPTIMUM BOUNDS FOR THE DISTRIBUTIONS OF MARTINGALES IN BANACH SPACES , 1994, 1208.2200.

[48] Julien Rabin,et al. Wasserstein Barycenter and Its Application to Texture Mixing , 2011, SSVM.

[49] James Zijun Wang,et al. Fast Discrete Distribution Clustering Using Wasserstein Barycenter With Sparse Support , 2015, IEEE Transactions on Signal Processing.

[50] Guillaume Carlier,et al. Barycenters in the Wasserstein Space , 2011, SIAM J. Math. Anal..

[51] Bernhard Schölkopf,et al. Kernel Mean Embedding of Distributions: A Review and Beyonds , 2016, Found. Trends Mach. Learn..

[52] Fredrik Lindsten,et al. Sequential Kernel Herding: Frank-Wolfe Optimization for Particle Filtering , 2015, AISTATS.