On parameter estimation with the Wasserstein distance

Statistical inference can be performed by minimizing, over the parameter space, the Wasserstein distance between model distributions and the empirical distribution of the data. We study asymptotic properties of such minimum Wasserstein distance estimators, complementing results derived by Bassetti, Bodini and Regazzini in 2006. In particular, our results cover the misspecified setting, in which the data-generating process is not assumed to be part of the family of distributions described by the model. Our results are motivated by recent applications of minimum Wasserstein estimators to complex generative models. We discuss some difficulties arising in the numerical approximation of these estimators. Two of our numerical examples ($g$-and-$\kappa$ and sum of log-normals) are taken from the literature on approximate Bayesian computation and have likelihood functions that are not analytically tractable. Two other examples involve misspecified models.

[1]  V. S. Varadarajan,et al.  WEAK CONVERGENCE OF MEASURES ON SEPARABLE METRIC SPACES , 2016 .

[2]  Xiaotong Shen,et al.  Empirical Likelihood , 2002 .

[3]  Marco Cuturi,et al.  GAN and VAE from an Optimal Transport Point of View , 2017, 1706.01807.

[4]  E. Lehmann Testing Statistical Hypotheses , 1960 .

[5]  A. Basu,et al.  Statistical Inference: The Minimum Distance Approach , 2011 .

[6]  Jason Altschuler,et al.  Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration , 2017, NIPS.

[7]  A. Doucet,et al.  The correlated pseudomarginal method , 2015, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[8]  L. Ambrosio,et al.  Gradient Flows: In Metric Spaces and in the Space of Probability Measures , 2005 .

[9]  J. Wolfowitz The Minimum Distance Method , 1957 .

[10]  Nacereddine Belili,et al.  Estimation basée sur la fonctionnelle de Kantorovich et la distance de Lévy , 1999 .

[11]  Soumendu Sundar Mukherjee,et al.  Weak convergence and empirical processes , 2019 .

[12]  Fredrik Lindsten,et al.  Coupling of Particle Filters , 2016, 1606.01156.

[13]  F. Bassetti,et al.  Asymptotic Properties and Robustness of Minimum Dissimilarity Estimators of Location-scale Parameters , 2006 .

[14]  D. McFadden A Method of Simulated Moments for Estimation of Discrete Response Models Without Numerical Integration , 1989 .

[15]  Adam M. Johansen,et al.  A simple approach to maximum intractable likelihood estimation , 2013 .

[16]  Scott A. Sisson,et al.  Recalibration: A post-processing method for approximate Bayesian computation , 2017, Comput. Stat. Data Anal..

[17]  L. Brown,et al.  Measurable Selections of Extrema , 1973 .

[18]  James Ze Wang,et al.  A Simulated Annealing Based Inexact Oracle for Wasserstein Loss Minimization , 2016, ICML.

[19]  John Parslow,et al.  On Disturbance State-Space Models and the Particle Marginal Metropolis-Hastings Sampler , 2012, SIAM/ASA J. Uncertain. Quantification.

[20]  F. Bach,et al.  Sharp asymptotic and finite-sample rates of convergence of empirical measures in Wasserstein distance , 2017, Bernoulli.

[21]  Gabriel Peyré,et al.  Iterative Bregman Projections for Regularized Transportation Problems , 2014, SIAM J. Sci. Comput..

[22]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[23]  P. Bickel,et al.  ON THE CHOICE OF m IN THE m OUT OF n BOOTSTRAP AND CONFIDENCE BOUNDS FOR EXTREMA , 2008 .

[24]  G. D. Rayner,et al.  Numerical maximum likelihood estimation for the g-and-k and generalized g-and-h distributions , 2002, Stat. Comput..

[25]  Victor Chernozhukov,et al.  Quantile regression , 2019, Journal of Econometrics.

[26]  L. Hansen Large Sample Properties of Generalized Method of Moments Estimators , 1982 .

[27]  A. Guillin,et al.  On the rate of convergence in Wasserstein distance of the empirical measure , 2013, 1312.2128.

[28]  E. Giné,et al.  Central limit theorems for the wasserstein distance between the empirical and the true distributions , 1999 .

[29]  R. Tibshirani,et al.  An introduction to the bootstrap , 1993 .

[30]  Bruce E. Hansen,et al.  Strong Laws for Dependent Heterogeneous Processes , 1991, Econometric Theory.

[31]  Carsten Gottschlich,et al.  The Shortlist Method for Fast Computation of the Earth Mover's Distance and Finding Optimal Solutions to Transportation Problems , 2014, PloS one.

[32]  Gabriel Peyré,et al.  Learning Generative Models with Sinkhorn Divergences , 2017, AISTATS.

[33]  Jean-Jacques Forneron,et al.  The ABC of simulation estimation with auxiliary statistics , 2015, Journal of Econometrics.

[34]  Christian P Robert,et al.  Bayesian computation via empirical likelihood , 2012, Proceedings of the National Academy of Sciences.

[35]  F. Takens Detecting strange attractors in turbulence , 1981 .

[36]  Martinez Jorge,et al.  Some properties of the tukey g and h family of distributions , 1984 .

[37]  Julien Rabin,et al.  Wasserstein Barycenter and Its Application to Texture Mixing , 2011, SSVM.

[38]  Sophie Dede,et al.  An empirical Central Limit Theorem in L1 for stationary sequences , 2008, 0812.2839.

[39]  L. Lecam On the Assumptions Used to Prove Asymptotic Normality of Maximum Likelihood Estimates , 1970 .

[40]  Mathieu Gerber,et al.  Approximate Bayesian computation with the Wasserstein distance , 2019, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[41]  Shakir Mohamed,et al.  Learning in Implicit Generative Models , 2016, ArXiv.

[42]  Jean-Michel Marin,et al.  Approximate Bayesian computational methods , 2011, Statistics and Computing.

[43]  Gabriel Peyré,et al.  Stochastic Optimization for Large-scale Optimal Transport , 2016, NIPS.

[44]  E. Barrio,et al.  Central limit theorems for empirical transportation cost in general dimension , 2017, The Annals of Probability.

[45]  Wotao Yin,et al.  A Parallel Method for Earth Mover’s Distance , 2018, J. Sci. Comput..

[46]  C. Villani Optimal Transport: Old and New , 2008 .

[47]  L. Devroye Non-Uniform Random Variate Generation , 1986 .

[48]  Alessandro Rudi,et al.  Massively scalable Sinkhorn distances via the Nyström method , 2018, NeurIPS.

[49]  Ronald C. Neath,et al.  On Convergence Properties of the Monte Carlo EM Algorithm , 2012, 1206.4768.

[50]  Yifan Chen,et al.  Natural gradient in Wasserstein statistical manifold , 2018, ArXiv.

[51]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[52]  Michel Barlaud,et al.  High-Dimensional Statistical Measure for Region-of-Interest Tracking , 2009, IEEE Transactions on Image Processing.

[53]  David B. Dunson,et al.  Robust Bayesian Inference via Coarsening , 2015, Journal of the American Statistical Association.

[54]  Arnold J Stromberg,et al.  Subsampling , 2001, Technometrics.

[55]  F. Bassetti,et al.  On minimum Kantorovich distance estimators , 2006 .

[56]  S. Wood Statistical inference for noisy nonlinear ecological dynamic systems , 2010, Nature.

[57]  E. Giné,et al.  Asymptotics for L2 functionals of the empirical quantile process, with applications to tests of fit based on weighted Wasserstein distances , 2005 .

[58]  Giovanni Puccetti An Algorithm to Approximate the Optimal Expected Inner Product of Two Vectors with Given Marginals , 2017 .

[59]  L. Fenton The Sum of Log-Normal Probability Distributions in Scatter Transmission Systems , 1960 .

[60]  W. R. Schucany,et al.  Minimum Distance and Robust Estimation , 1980 .

[61]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[62]  Marco Cuturi,et al.  On Wasserstein Two-Sample Testing and Related Families of Nonparametric Tests , 2015, Entropy.

[63]  Julien Rabin,et al.  Sliced and Radon Wasserstein Barycenters of Measures , 2014, Journal of Mathematical Imaging and Vision.

[64]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[65]  G. C. Wei,et al.  A Monte Carlo Implementation of the EM Algorithm and the Poor Man's Data Augmentation Algorithms , 1990 .

[66]  Yanan Fan,et al.  Handbook of Approximate Bayesian Computation , 2018 .

[67]  David Pollard,et al.  The minimum distance method of testing , 1980 .

[68]  Gabriel Peyré,et al.  Computational Optimal Transport , 2018, Found. Trends Mach. Learn..

[69]  Paul Fearnhead,et al.  On the Asymptotic Efficiency of Approximate Bayesian Computation Estimators , 2015, 1506.03481.

[70]  E. Cheney,et al.  The Existence and Unicity of Best Approximations. , 1969 .

[71]  John N. Tsitsiklis,et al.  Introduction to linear optimization , 1997, Athena scientific optimization and computation series.

[72]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.