Online Sinkhorn: Optimal Transport distances from sample streams

Optimal Transport (OT) distances are now routinely used as loss functions in ML tasks. Yet, computing OT distances between arbitrary (i.e. not necessarily discrete) probability distributions remains an open problem. This paper introduces a new online estimator of entropy-regularized OT distances between two such arbitrary distributions. It uses streams of samples from both distributions to iteratively enrich a non-parametric representation of the transportation plan. Compared to the classic Sinkhorn algorithm, our method leverages new samples at each iteration, which enables a consistent estimation of the true regularized OT distance. We provide a theoretical analysis of the convergence of the online Sinkhorn algorithm, showing a nearly-O(1/n) asymptotic sample complexity for the iterate sequence. We validate our method on synthetic 1D to 10D data and on real 3D shape data.

[1]  F. Bach,et al.  Sharp asymptotic and finite-sample rates of convergence of empirical measures in Wasserstein distance , 2017, Bernoulli.

[2]  Justin Solomon,et al.  Parallel Streaming Wasserstein Barycenters , 2017, NIPS.

[3]  Gabriel Peyré,et al.  Semi-dual Regularized Optimal Transport , 2018, SIAM Rev..

[4]  Alain Trouvé,et al.  Interpolating between Optimal Transport and MMD using Sinkhorn Divergences , 2018, AISTATS.

[5]  Hossein Mobahi,et al.  Learning with a Wasserstein Loss , 2015, NIPS.

[6]  Eric Moulines,et al.  Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.

[7]  F. L'eger A Gradient Descent Perspective on Sinkhorn , 2020, 2002.03758.

[8]  Volkan Cevher,et al.  Finding Mixed Nash Equilibria of Generative Adversarial Networks , 2018, ICML.

[9]  Gabriel Peyré,et al.  Geometric Losses for Distributional Learning , 2019, ICML.

[10]  Lorenzo Rosasco,et al.  Learning Probability Measures with respect to Optimal Transport Metrics , 2012, NIPS.

[11]  François-Xavier Vialard An elementary introduction to entropic regularization and proximal methods for numerical optimal transport , 2019 .

[12]  Marc Teboulle,et al.  Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..

[13]  Richard Sinkhorn A Relationship Between Arbitrary Positive Matrices and Doubly Stochastic Matrices , 1964 .

[14]  Gabriel Peyré,et al.  Learning Generative Models with Sinkhorn Divergences , 2017, AISTATS.

[15]  Richard Sinkhorn,et al.  Concerning nonnegative matrices and doubly stochastic matrices , 1967 .

[16]  H. Robbins A Stochastic Approximation Method , 1951 .

[17]  Andrew V. Goldberg,et al.  Finding minimum-cost circulations by canceling negative cycles , 1989, JACM.

[18]  Jason Altschuler,et al.  Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration , 2017, NIPS.

[19]  Gabriel Peyré,et al.  Iterative Bregman Projections for Regularized Transportation Problems , 2014, SIAM J. Sci. Comput..

[20]  Marco Cuturi,et al.  Entropic Optimal Transport between Unbalanced Gaussian Measures has a Closed Form , 2020, NeurIPS.

[21]  Lénaïc Chizat Sparse optimization on measures with over-parameterized gradient descent , 2019, Mathematical Programming.

[22]  Persi Diaconis,et al.  Iterated Random Functions , 1999, SIAM Rev..

[23]  Arnak S. Dalalyan,et al.  User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient , 2017, Stochastic Processes and their Applications.

[24]  Gabriel Peyré,et al.  Stochastic Optimization for Large-scale Optimal Transport , 2016, NIPS.

[25]  Marc Levoy,et al.  Zippered polygon meshes from range images , 1994, SIGGRAPH.

[26]  Martin Jaggi,et al.  On the Global Linear Convergence of Frank-Wolfe Optimization Variants , 2015, NIPS.

[27]  R. Dudley The Speed of Mean Glivenko-Cantelli Convergence , 1969 .

[28]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[29]  Gabriel Peyré,et al.  Computational Optimal Transport , 2018, Found. Trends Mach. Learn..

[30]  Bas Lemmens,et al.  Nonlinear Perron-Frobenius Theory , 2012 .

[31]  Nicolas Courty,et al.  Learning with minibatch Wasserstein : asymptotic and gradient properties , 2020, AISTATS.

[32]  Gabriel Peyré,et al.  Sample Complexity of Sinkhorn Divergences , 2018, AISTATS.

[33]  F. Santambrogio Optimal Transport for Applied Mathematicians: Calculus of Variations, PDEs, and Modeling , 2015 .

[34]  Julien Mairal,et al.  Stochastic Majorization-Minimization Algorithms for Large-Scale Optimization , 2013, NIPS.

[35]  L. Breiman The Strong Law of Large Numbers for a Class of Markov Chains , 1960 .

[36]  Nicolas Courty,et al.  Large Scale Optimal Transport and Mapping Estimation , 2017, ICLR.

[37]  Quentin Mérigot,et al.  A Multiscale Approach to Optimal Transport , 2011, Comput. Graph. Forum.

[38]  Wenbo Gong,et al.  Wasserstein Generative Adversarial Network , 2017 .

[39]  C. E. Chidume,et al.  Stochastic Approximation Method for Fixed Point Problems , 2012 .

[40]  G. Celeux,et al.  A stochastic approximation type EM algorithm for the mixture problem , 1992 .

[41]  Nicolas Courty,et al.  Optimal Transport for Domain Adaptation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Konstantin Mishchenko,et al.  Sinkhorn Algorithm as a Special Case of Stochastic Mirror Descent , 2019, ArXiv.

[43]  Filippo Santambrogio,et al.  Optimal Transport for Applied Mathematicians , 2015 .