Measuring Sample Quality with Stein's Method

To improve the efficiency of Monte Carlo estimation, practitioners are turning to biased Markov chain Monte Carlo procedures that trade off asymptotic exactness for computational speed. The reasoning is sound: a reduction in variance due to more rapid sampling can outweigh the bias introduced. However, the inexactness creates new challenges for sampler and parameter selection, since standard measures of sample quality like effective sample size do not account for asymptotic bias. To address these challenges, we introduce a new computable quality measure based on Stein's method that quantifies the maximum discrepancy between sample and target expectations over a large class of test functions. We use our tool to compare exact, biased, and deterministic sample sequences and illustrate applications to hyperparameter selection, convergence rate assessment, and quantifying bias-variance tradeoffs in posterior inference.

[1]  G. Glaeser Étude de Quelques Algèbres Tayloriennes , 1958 .

[2]  I. Sobol On the distribution of points in a cube and the approximate evaluation of integrals , 1967 .

[3]  C. Stein A bound for the error in the normal approximation to the distribution of a sum of dependent random variables , 1972 .

[4]  S. S. Vallender Calculation of the Wasserstein Distance Between Probability Distributions on the Line , 1974 .

[5]  Paul Chew,et al.  There is a planar graph almost as good as the complete graph , 1986, SCG '86.

[6]  A. Barbour Stein's method and poisson process convergence , 1988, Journal of Applied Probability.

[7]  A. Barbour Stein's method for diffusion approximations , 1990 .

[8]  C. Geyer Markov Chain Monte Carlo Maximum Likelihood , 1991 .

[9]  F. Götze On the Rate of Convergence in the Multivariate CLT , 1991 .

[10]  Jose Augusto Ramos Soares,et al.  Graph Spanners: a Survey , 1992 .

[11]  A. Zellner,et al.  Gibbs Sampler Convergence Criteria , 1995 .

[12]  R. Tweedie,et al.  Exponential convergence of Langevin distributions and their discrete approximations , 1996 .

[13]  A. Müller Integral Probability Metrics and Their Generating Classes of Functions , 1997, Advances in Applied Probability.

[14]  R. Caflisch Monte Carlo and quasi-Monte Carlo methods , 1998, Acta Numerica.

[15]  E. Giné,et al.  Central limit theorems for the wasserstein distance between the empirical and the true distributions , 1999 .

[16]  Martin Raič,et al.  Normal Approximation by Stein ’ s Method , 2003 .

[17]  P. Diaconis,et al.  Use of exchangeable pairs in the analysis of simulations , 2004 .

[18]  Sariel Har-Peled,et al.  Fast construction of nets in low dimensional metrics, and their applications , 2004, SCG.

[19]  S. Ethier,et al.  Markov Processes: Characterization and Convergence , 2005 .

[20]  P. Shvartsman The Whitney extension problem and Lipschitz selections of set-valued mappings in jet-spaces , 2006, math/0601711.

[21]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[22]  Steve P. Brooks,et al.  Output Assessment for Monte Carlo Simulations via the Score Statistic , 2006 .

[23]  S. Chatterjee,et al.  MULTIVARIATE NORMAL APPROXIMATION USING EXCHANGEABLE PAIRS , 2007, math/0701464.

[24]  Giri Narasimhan,et al.  Geometric spanner networks , 2007 .

[25]  I. Sloan,et al.  Low discrepancy sequences in high dimensions: How well are their projections distributed? , 2008 .

[26]  Q. Shao,et al.  Stein's Method of Exchangeable Pairs with Application to the Curie-Weiss Model , 2009, 0907.4450.

[27]  G. Reinert,et al.  Multivariate normal approximation with Stein’s method of exchangeable pairs under a general linearity condition , 2007, 0711.1082.

[28]  Elizabeth S. Meckes,et al.  On Stein's method for multivariate normal approximation , 2009, 0902.0333.

[29]  M. Cule,et al.  Theoretical properties of the log-concave maximum likelihood estimator of a multidimensional density , 2009, 0908.4400.

[30]  Alexander J. Smola,et al.  Super-Samples from Kernel Herding , 2010, UAI.

[31]  Andrew Gelman,et al.  Handbook of Markov Chain Monte Carlo , 2011 .

[32]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[33]  S. Glotzer,et al.  Time-course gait analysis of hemiparkinsonian rats following 6-hydroxydopamine lesion , 2004, Behavioural Brain Research.

[34]  Ahn,et al.  Bayesian posterior sampling via stochastic gradient Fisher scoring Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring , 2012 .

[35]  Francis R. Bach,et al.  On the Equivalence between Herding and Conditional Gradient Algorithms , 2012, ICML.

[36]  Ahn Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring , 2012 .

[37]  Max Welling,et al.  Austerity in MCMC Land: Cutting the Metropolis-Hastings Budget , 2013, ICML 2014.

[38]  Kevin Buchin,et al.  A Framework for Computing the Greedy Spanner , 2014, SoCG.

[39]  N. Chopin,et al.  Control functionals for Monte Carlo integration , 2014, 1410.2392.

[40]  C. Dobler Stein's method of exchangeable pairs for the Beta distribution and generalizations , 2014, 1411.4477.

[41]  Iain Dunning,et al.  Computing in Operations Research Using Julia , 2013, INFORMS J. Comput..

[42]  Lester W. Mackey,et al.  Multivariate Stein Factors for a Class of Strongly Log-concave Distributions , 2015, 1512.07392.

[43]  Lester W. Mackey,et al.  Multivariate Stein Factors for Strongly Log-concave Distributions , 2015 .

[44]  Qiang Liu,et al.  A Kernelized Stein Discrepancy for Goodness-of-fit Tests , 2016, ICML.

[45]  Arthur Gretton,et al.  A Kernel Test of Goodness of Fit , 2016, ICML.