Inference in generative models using the Wasserstein distance

In purely generative models, one can simulate data given parameters but not necessarily evaluate the likelihood. We use Wasserstein distances between empirical distributions of observed data and empirical distributions of synthetic data drawn from such models to estimate their parameters. Previous interest in the Wasserstein distance for statistical inference has been mainly theoretical, due to computational limitations. Thanks to recent advances in numerical transport, the computation of these distances has become feasible, up to controllable approximation errors. We leverage these advances to propose point estimators and quasi-Bayesian distributions for parameter inference, first for independent data. For dependent data, we extend the approach by using delay reconstruction and residual reconstruction techniques. For large data sets, we propose an alternative distance using the Hilbert space-filling curve, which computation scales as nlogn where n is the size of the data. We provide a theoretical study of the proposed estimators, and adaptive Monte Carlo algorithms to approximate them. The approach is illustrated on four examples: a quantile g-and-k distribution, a toggle switch model from systems biology, a Lotka-Volterra model for plankton population sizes and a L\'evy-driven stochastic volatility model.

[1]  J. Wolfowitz The Minimum Distance Method , 1957 .

[2]  E. Cheney,et al.  The Existence and Unicity of Best Approximations. , 1969 .

[3]  L. Lecam On the Assumptions Used to Prove Asymptotic Normality of Maximum Likelihood Estimates , 1970 .

[4]  W. R. Schucany,et al.  Minimum Distance and Robust Estimation , 1980 .

[5]  David Pollard,et al.  The minimum distance method of testing , 1980 .

[6]  F. Takens Detecting strange attractors in turbulence , 1981 .

[7]  L. Hansen Large Sample Properties of Generalized Method of Moments Estimators , 1982 .

[8]  G. C. Wei,et al.  A Monte Carlo Implementation of the EM Algorithm and the Poor Man's Data Augmentation Algorithms , 1990 .

[9]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[10]  H. Sagan Space-filling curves , 1994 .

[11]  P. Gänssler Weak Convergence and Empirical Processes - A. W. van der Vaart; J. A. Wellner. , 1997 .

[12]  H. Kantz,et al.  Nonlinear time series analysis , 1997 .

[13]  R. Moeckel,et al.  Measuring the distance between time series , 1997 .

[14]  E. Giné,et al.  Central limit theorems for the wasserstein distance between the empirical and the true distributions , 1999 .

[15]  D. Balding,et al.  Approximate Bayesian computation in population genetics. , 2002, Genetics.

[16]  N. Shephard,et al.  Econometric analysis of realized volatility and its use in estimating stochastic volatility models , 2002 .

[17]  G. D. Rayner,et al.  Numerical maximum likelihood estimation for the g-and-k and generalized g-and-h distributions , 2002, Stat. Comput..

[18]  C. Villani Topics in Optimal Transportation , 2003 .

[19]  David S. Broomhead,et al.  Delay Embeddings for Forced Systems. II. Stochastic Forcing , 2003, J. Nonlinear Sci..

[20]  L. Ambrosio,et al.  Gradient Flows: In Metric Spaces and in the Space of Probability Measures , 2005 .

[21]  E. Giné,et al.  Asymptotics for L2 functionals of the empirical quantile process, with applications to tests of fit based on weighted Wasserstein distances , 2005 .

[22]  F. Bassetti,et al.  On minimum Kantorovich distance estimators , 2006 .

[23]  F. Bassetti,et al.  Asymptotic Properties and Robustness of Minimum Dissimilarity Estimators of Location-scale Parameters , 2006 .

[24]  C. Villani Optimal Transport: Old and New , 2008 .

[25]  Kevin Buchin,et al.  Computing the Fréchet distance between simple polygons , 2008, Comput. Geom..

[26]  採編典藏組 Society for Industrial and Applied Mathematics(SIAM) , 2008 .

[27]  Roman Holenstein,et al.  Particle Markov chain Monte Carlo , 2009 .

[28]  Mark A. Beaumont,et al.  Approximate Bayesian Computation Without Summary Statistics: The Case of Admixture , 2009, Genetics.

[29]  Paul Fearnhead,et al.  Constructing Summary Statistics for Approximate Bayesian Computation: Semi-automatic ABC , 2010, 1004.1112.

[30]  S. Wood Statistical inference for noisy nonlinear ecological dynamic systems , 2010, Nature.

[31]  Anthony N. Pettitt,et al.  Likelihood-free Bayesian estimation of multivariate quantile distributions , 2011, Comput. Stat. Data Anal..

[32]  A. Basu,et al.  Statistical Inference: The Minimum Distance Approach , 2011 .

[33]  M. Muskulus,et al.  Wasserstein distances in the analysis of time series and dynamical systems , 2011 .

[34]  Julien Rabin,et al.  Wasserstein Barycenter and Its Application to Texture Mixing , 2011, SSVM.

[35]  Quentin Mérigot,et al.  A Multiscale Approach to Optimal Transport , 2011, Comput. Graph. Forum.

[36]  Mike West,et al.  Bayesian Learning from Marginal Data in Bionetwork Models , 2011, Statistical applications in genetics and molecular biology.

[37]  Jean-Michel Marin,et al.  Approximate Bayesian computational methods , 2011, Statistics and Computing.

[38]  Anthony Lee,et al.  On the choice of MCMC kernels for approximate Bayesian computation with SMC samplers , 2012, Proceedings Title: Proceedings of the 2012 Winter Simulation Conference (WSC).

[39]  Arnaud Doucet,et al.  An adaptive sequential Monte Carlo method for approximate Bayesian computation , 2011, Statistics and Computing.

[40]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[41]  Ronald C. Neath,et al.  On Convergence Properties of the Monte Carlo EM Algorithm , 2012, 1206.4768.

[42]  A. Guillin,et al.  On the rate of convergence in Wasserstein distance of the empirical measure , 2013, 1312.2128.

[43]  Ulrich K. Müller RISK OF BAYESIAN INFERENCE IN MISSPECIFIED MODELS, AND THE SANDWICH COVARIANCE MATRIX , 2013 .

[44]  Adam M. Johansen,et al.  A simple approach to maximum intractable likelihood estimation , 2013 .

[45]  Christian P Robert,et al.  Bayesian computation via empirical likelihood , 2012, Proceedings of the National Academy of Sciences.

[46]  John Parslow,et al.  On Disturbance State-Space Models and the Particle Marginal Metropolis-Hastings Sampler , 2012, SIAM/ASA J. Uncertain. Quantification.

[47]  Carsten Gottschlich,et al.  The Shortlist Method for Fast Computation of the Earth Mover's Distance and Finding Optimal Solutions to Transportation Problems , 2014, PloS one.

[48]  Colas Schretter,et al.  Van der Corput and Golden Ratio Sequences Along the Hilbert Space-Filling Curve , 2014, MCQMC.

[49]  N. Chopin,et al.  Sequential Quasi-Monte Carlo , 2014, 1402.4039.

[50]  Radford M. Neal,et al.  On Bayesian inference for the M/G/1 queue with efficient MCMC sampling , 2014, 1401.5548.

[51]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[52]  Anthony Lee,et al.  Variance bounding and geometric ergodicity of Markov chain Monte Carlo kernels for approximate Bayesian computation , 2012, 1210.6703.

[53]  Mike West,et al.  Sequential Monte Carlo with Adaptive Weights for Approximate Bayesian Computation , 2015, 1503.07791.

[54]  Paul Fearnhead,et al.  On the Asymptotic Efficiency of ABC Estimators , 2015 .

[55]  Rupak Majumdar,et al.  Computing the Skorokhod distance between polygonal traces , 2015, HSCC.

[56]  Volkan Cevher,et al.  WASP: Scalable Bayes via barycenters of subset posteriors , 2015, AISTATS.

[57]  Wittawat Jitkrittum,et al.  K2-ABC: Approximate Bayesian Computation with Kernel Embeddings , 2015, AISTATS.

[58]  Gabriel Peyré,et al.  Stochastic Optimization for Large-scale Optimal Transport , 2016, NIPS.

[59]  Max Sommerfeld,et al.  Inference for empirical Wasserstein distances on finite spaces , 2016, 1610.03287.

[60]  Klaus-Robert Müller,et al.  Wasserstein Training of Restricted Boltzmann Machines , 2016, NIPS.

[61]  Amos J. Storkey,et al.  Asymptotically exact inference in differentiable generative models , 2016, AISTATS.

[62]  Marco Cuturi,et al.  On Wasserstein Two-Sample Testing and Related Families of Nonparametric Tests , 2015, Entropy.

[63]  James Ze Wang,et al.  A Simulated Annealing Based Inexact Oracle for Wasserstein Loss Minimization , 2016, ICML.

[64]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[65]  Esteban G. Tabak,et al.  Statistical Archetypal Analysis , 2017 .

[66]  Giovanni Puccetti An Algorithm to Approximate the Optimal Expected Inner Product of Two Vectors with Given Marginals , 2017 .

[67]  Richard G. Everitt,et al.  A rare event approach to high-dimensional approximate Bayesian computation , 2016, Statistics and Computing.

[68]  Jean-Jacques Forneron,et al.  The ABC of simulation estimation with auxiliary statistics , 2015, Journal of Econometrics.

[69]  David T. Frazier,et al.  Asymptotic properties of approximate Bayesian computation , 2016, Biometrika.

[70]  David B. Dunson,et al.  Robust Bayesian Inference via Coarsening , 2015, Journal of the American Statistical Association.

[71]  E. Barrio,et al.  Central limit theorems for empirical transportation cost in general dimension , 2017, The Annals of Probability.