A stochastic Gauss-Newton algorithm for regularized semi-discrete optimal transport

We introduce a new second order stochastic algorithm to estimate the entropically regularized optimal transport cost between two probability measures. The source measure can be arbitrary chosen, either absolutely continuous or discrete, while the target measure is assumed to be discrete. To solve the semi-dual formulation of such a regularized and semi-discrete optimal transportation problem, we propose to consider a stochastic Gauss-Newton algorithm that uses a sequence of data sampled from the source measure. This algorithm is shown to be adaptive to the geometry of the underlying convex optimization problem with no important hyperparameter to be accurately tuned. We establish the almost sure convergence and the asymptotic normality of various estimators of interest that are constructed from this stochastic Gauss-Newton algorithm. We also analyze their non-asymptotic rates of convergence for the expected quadratic risk in the absence of strong convexity of the underlying objective function. The results of numerical experiments from simulated data are also reported to illustrate the finite sample properties of this Gauss-Newton algorithm for stochastic regularized optimal transport, and to show its advantages over the use of the stochastic gradient descent, stochastic Newton and ADAM algorithms.

[1]  Max Sommerfeld,et al.  Inference for empirical Wasserstein distances on finite spaces , 2016, 1610.03287.

[2]  Bernard Bercu,et al.  An Efficient Stochastic Newton Algorithm for Parameter Estimation in Logistic Regressions , 2019, SIAM J. Control. Optim..

[3]  Nicolas Papadakis,et al.  Log-PCA versus Geodesic PCA of histograms in the Wasserstein space , 2017, 1708.08143.

[4]  H. Robbins,et al.  A Convergence Theorem for Non Negative Almost Supermartingales and Some Applications , 1985 .

[5]  K. Kurdyka On gradients of functions definable in o-minimal structures , 1998 .

[6]  Jean-Michel Loubes,et al.  Obtaining Fairness using Optimal Transport Theory , 2018, ICML.

[7]  Julien Rabin,et al.  Convex Color Image Segmentation with Optimal Transport Distances , 2015, SSVM.

[8]  C. Villani Optimal Transport: Old and New , 2008 .

[9]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[10]  Gabriel Peyré,et al.  Learning Generative Models with Sinkhorn Divergences , 2017, AISTATS.

[11]  Ton Steerneman,et al.  Properties of the matrix A − XY* , 2002 .

[12]  Jérémie Bigot,et al.  Asymptotic distribution and convergence rates of stochastic algorithms for entropic optimal transportation between probability measures , 2018, The Annals of Statistics.

[13]  Stephen S. Wilson,et al.  Random iterative models , 1996 .

[14]  Quentin Mérigot,et al.  A Multiscale Approach to Optimal Transport , 2011, Comput. Graph. Forum.

[15]  Gabriel Peyré,et al.  Stochastic Optimization for Large-scale Optimal Transport , 2016, NIPS.

[16]  Bruce W. Suter,et al.  From error bounds to the complexity of first-order descent methods for convex functions , 2015, Math. Program..

[17]  Victor M. Panaretos,et al.  Amplitude and phase variation of point processes , 2016, 1603.08691.

[18]  S. Gadat,et al.  Optimal non-asymptotic bound of the Ruppert-Polyak averaging without strong convexity , 2017, 1709.03342.

[19]  Q. Mérigot,et al.  A Newton algorithm for semi-discrete optimal transport with storage fees and quantitative convergence of cells , 2019, SIAM J. Optim..

[20]  Julien Rabin,et al.  Regularized Discrete Optimal Transport , 2013, SIAM J. Imaging Sci..

[21]  Marco Cuturi,et al.  Principal Geodesic Analysis for Probability Measures under the Optimal Transport Metric , 2015, NIPS.

[22]  Julien Rabin,et al.  Sliced and Radon Wasserstein Barycenters of Measures , 2014, Journal of Mathematical Imaging and Vision.

[23]  Jason Altschuler,et al.  Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration , 2017, NIPS.

[24]  Axel Munk,et al.  Empirical Regularized Optimal Transport: Statistical Theory and Applications , 2018, SIAM J. Math. Data Sci..

[25]  Chun Yuan Deng,et al.  A generalization of the Sherman-Morrison-Woodbury formula , 2011, Appl. Math. Lett..

[26]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[27]  Victor M. Panaretos,et al.  Fréchet means and Procrustes analysis in Wasserstein space , 2017, Bernoulli.

[28]  Gabriel Peyré,et al.  Fast Dictionary Learning with a Smoothed Wasserstein Loss , 2016, AISTATS.

[29]  Yoav Zemel,et al.  Statistical Aspects of Wasserstein Distances , 2018, Annual Review of Statistics and Its Application.

[30]  William W. Hager,et al.  Updating the Inverse of a Matrix , 1989, SIAM Rev..

[31]  Li-Xin Zhang,et al.  Central Limit Theorems of a Recursive Stochastic Algorithm with Applications to Adaptive Designs , 2016, 1602.05708.

[32]  Gabriel Peyré,et al.  Fast Optimal Transport Averaging of Neuroimaging Data , 2015, IPMI.

[33]  Gabriel Peyré,et al.  Computational Optimal Transport , 2018, Found. Trends Mach. Learn..

[34]  Nicolas Courty,et al.  Wasserstein discriminant analysis , 2016, Machine Learning.

[35]  Mariane Pelletier,et al.  Asymptotic Almost Sure Efficiency of Averaged Stochastic Algorithms , 2000, SIAM J. Control. Optim..

[36]  Quentin Mérigot,et al.  An algorithm for optimal transport between a simplex soup and a point cloud , 2018, SIAM J. Imaging Sci..

[37]  J'er'emie Bigot,et al.  Data-driven regularization of Wasserstein barycenters with an application to multivariate density registration , 2018, Information and Inference: A Journal of the IMA.

[38]  Francis R. Bach,et al.  Adaptivity of averaged stochastic gradient descent to local strong convexity for logistic regression , 2013, J. Mach. Learn. Res..

[39]  Adrian S. Lewis,et al.  The [barred L]ojasiewicz Inequality for Nonsmooth Subanalytic Functions with Applications to Subgradient Dynamical Systems , 2006, SIAM J. Optim..

[40]  Gabriel Peyré,et al.  Iterative Bregman Projections for Regularized Transportation Problems , 2014, SIAM J. Sci. Comput..

[41]  Jérémie Bigot,et al.  Geodesic PCA in the Wasserstein space by Convex PCA , 2017 .

[42]  Hossein Mobahi,et al.  Learning with a Wasserstein Loss , 2015, NIPS.

[43]  J'er'emie Bigot,et al.  Statistical data analysis in the Wasserstein space , 2019, ESAIM: Proceedings and Surveys.

[44]  P. Rigollet,et al.  Entropic optimal transport is maximum-likelihood deconvolution , 2018, Comptes Rendus Mathematique.

[45]  Gabriel Peyré,et al.  Semi-dual Regularized Optimal Transport , 2018, SIAM Rev..