Stein’s Method Meets Computational Statistics: A Review of Some Recent Developments

Stein’s method compares probability distributions through the study of a class of linear operators called Stein operators. While mainly studied in probability and used to underpin theoretical statistics, Stein’s method has led to significant advances in computational statistics in recent years. The goal of this survey is to bring together some of these recent developments and, in doing so, to stimulate further research into the successful field of Stein’s method and statistics. The topics we discuss include tools to benchmark and compare sampling methods such as approximate Markov chain Monte Carlo, deterministic alternatives to sampling methods, control variate techniques, parameter estimation and goodness-of-fit testing.

[1]  Dilin Wang,et al.  Learning to Draw Samples with Amortized Stein Variational Gradient Descent , 2017, UAI.

[2]  S. Betsch,et al.  Testing normality via a distributional fixed point property in the Stein characterization , 2018, TEST.

[3]  Louis H. Y. Chen,et al.  Stein couplings for normal approximation , 2010, 1003.6039.

[4]  Qiang Liu,et al.  Goodness-of-fit Testing for Discrete Distributions via Stein Discrepancy , 2018, ICML.

[5]  Kenji Fukumizu,et al.  A Linear-Time Kernel Goodness-of-Fit Test , 2017, NIPS.

[6]  G. Reinert,et al.  Relaxing the Gaussian assumption in shrinkage and SURE in high dimension , 2020, The Annals of Statistics.

[7]  Tiangang Cui,et al.  A Stein variational Newton method , 2018, NeurIPS.

[8]  A. Gretton,et al.  A Non-Asymptotic Analysis for Stein Variational Gradient Descent , 2020, NeurIPS.

[9]  F. Götze On the Rate of Convergence in the Multivariate CLT , 1991 .

[10]  Marina Riabiz,et al.  Optimal Quantisation of Probability Measures Using Maximum Mean Discrepancy , 2020, AISTATS.

[11]  C. Oates,et al.  Generalised Bayesian Inference for Discrete Intractable Likelihood , 2022, 2206.08420.

[12]  Dilin Wang,et al.  Learning to Draw Samples: With Application to Amortized MLE for Generative Adversarial Learning , 2016, ArXiv.

[13]  A. Duncan,et al.  On the geometry of Stein variational gradient descent , 2019, ArXiv.

[14]  Lester W. Mackey,et al.  Measuring Sample Quality with Diffusions , 2016, The Annals of Applied Probability.

[15]  Antonietta Mira,et al.  Zero variance Markov chain Monte Carlo for Bayesian estimators , 2010, Stat. Comput..

[16]  Qiang Liu,et al.  Stein Variational Gradient Descent With Matrix-Valued Kernels , 2019, NeurIPS.

[17]  F. Nestmann,et al.  Characterizations of non-normalized discrete probability distributions and their application in statistics , 2020, Electronic Journal of Statistics.

[18]  On the Rate of Convergence in the Central Limit Theorem for Weakly Dependent Random Variables , 1992 .

[19]  Arthur Gretton,et al.  Kernelized Stein Discrepancy Tests of Goodness-of-fit for Time-to-Event Data , 2020, ICML.

[20]  Zhe Gan,et al.  VAE Learning via Stein Variational Gradient Descent , 2017, NIPS.

[21]  G. Reinert,et al.  Stein’s method for the bootstrap , 2004 .

[22]  Lester W. Mackey,et al.  Measuring Sample Quality with Kernels , 2017, ICML.

[23]  Franccois-Xavier Briol,et al.  Robust generalised Bayesian inference for intractable likelihoods , 2021, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[24]  S. Rachev,et al.  The Methods of Distances in the Theory of Probability and Statistics , 2013 .

[25]  É. Moulines,et al.  Variance reduction for Markov chains with application to MCMC , 2019, Statistics and Computing.

[26]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[27]  Qiang Liu,et al.  Black-box Importance Sampling , 2016, AISTATS.

[28]  Gesine Reinert,et al.  Stein's density method for multivariate continuous distributions , 2021 .

[29]  Takafumi Kanamori,et al.  Fisher Efficient Inference of Intractable Models , 2018, NeurIPS.

[30]  Franccois-Xavier Briol,et al.  Stein Point Markov Chain Monte Carlo , 2019, ICML.

[31]  M. Caffarel,et al.  Zero-Variance Principle for Monte Carlo Algorithms , 1999, cond-mat/9911396.

[32]  Alison L Gibbs,et al.  On Choosing and Bounding Probability Metrics , 2002, math/0209021.

[33]  Leah F. South,et al.  Regularised Zero-Variance Control Variates for High-Dimensional Variance Reduction , 2018 .

[34]  B. Ebner,et al.  On Testing the Adequacy of the Inverse Gaussian Distribution , 2022, Mathematics.

[35]  Ron Goldman,et al.  Poisson approximation , 2000, Proceedings Geometric Modeling and Processing 2000. Theory and Applications.

[36]  Robert E. Gaunt Bounds for the chi-square approximation of the power divergence family of statistics , 2021, Journal of Applied Probability.

[37]  Yang Liu,et al.  Stein Variational Policy Gradient , 2017, UAI.

[38]  C. Houdr'e,et al.  On Stein's Method for Infinitely Divisible Laws with Finite First Moment , 2017, SpringerBriefs in Probability and Mathematical Statistics.

[39]  Q. Shao,et al.  Multivariate approximations in Wasserstein distance by Stein’s method and Bismut’s formula , 2018, Probability Theory and Related Fields.

[40]  Wenkai Xu Standardisation-function Kernel Stein Discrepancy: A Unifying View on Kernel Stein Discrepancy Tests for Goodness-of-fit , 2021, AISTATS.

[41]  Jascha Sohl-Dickstein,et al.  Minimum Probability Flow Learning , 2009, ICML.

[42]  Robert E. Gaunt,et al.  Bounds for the chi-square approximation of Friedman’s statistic by Stein’s method , 2021, Bernoulli.

[43]  Jon Cockayne,et al.  Optimal thinning of MCMC output , 2020, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[44]  G. Reinert,et al.  Approximating stationary distributions of fast mixing Glauber dynamics, with applications to exponential random graphs , 2017, The Annals of Applied Probability.

[45]  Le Song,et al.  A Hilbert Space Embedding for Distributions , 2007, Discovery Science.

[46]  Robert E. Gaunt On Stein's method for products of normal random variables and zero bias couplings , 2013, 1309.4344.

[47]  Quantification of the impact of priors in Bayesian statistics via Stein’s Method , 2019, Statistics & Probability Letters.

[48]  Ludwig Baringhaus,et al.  A class of consistent tests for exponentiality based on the empirical Laplace transform , 1991 .

[49]  L. Carin,et al.  Scalable Control Variates for Monte Carlo Methods via Stochastic Optimization , 2020, MCQMC.

[50]  Qiang Liu,et al.  Stein Variational Gradient Descent as Moment Matching , 2018, NeurIPS.

[51]  J. Neyman,et al.  INADMISSIBILITY OF THE USUAL ESTIMATOR FOR THE MEAN OF A MULTIVARIATE NORMAL DISTRIBUTION , 2005 .

[52]  Ohad Shamir,et al.  Global Non-convex Optimization with Discretized Diffusions , 2018, NeurIPS.

[53]  Dengyong Zhou,et al.  Action-depedent Control Variates for Policy Optimization via Stein's Identity , 2017 .

[54]  Gesine Reinert,et al.  Alignment-Free Sequence Comparison (I): Statistics and Power , 2009, J. Comput. Biol..

[55]  Susan P. Holmes Stein’s method for birth and death chains , 2004 .

[56]  Ing Rj Ser Approximation Theorems of Mathematical Statistics , 1980 .

[57]  Thomas A. Courtade,et al.  Existence of Stein Kernels under a Spectral Gap, and Discrepancy Bound , 2017, Annales de l'Institut Henri Poincaré, Probabilités et Statistiques.

[58]  M. Girolami,et al.  A Riemannian-Stein Kernel method , 2018 .

[59]  Lester W. Mackey,et al.  Measuring Sample Quality with Stein's Method , 2015, NIPS.

[60]  A. Barbour Stein's method for diffusion approximations , 1990 .

[61]  D. Freedman,et al.  On the consistency of Bayes estimates , 1986 .

[62]  Q. Shao An Explicit Berry-Esseen Bound for Student's t-Statistic Via Stein's Method , 2005 .

[63]  Liam Hodgkinson,et al.  The reproducing Stein kernel approach for post-hoc corrected sampling , 2020, 2001.09266.

[64]  N. Henze,et al.  Testing for normality in any dimension based on a partial differential equation involving the moment generating function , 2019, Annals of the Institute of Statistical Mathematics.

[65]  Qiang Liu,et al.  A Kernelized Stein Discrepancy for Goodness-of-fit Tests , 2016, ICML.

[66]  N. Chopin,et al.  Control functionals for Monte Carlo integration , 2014, 1410.2392.

[67]  B. Klar,et al.  Minimum Lq‐distance estimators for non‐normalized parametric models , 2019, Canadian Journal of Statistics.

[68]  A new test of multivariate normality by a double estimation in a characterizing PDE , 2019, 1911.10955.

[69]  P. Diaconis,et al.  Use of exchangeable pairs in the analysis of simulations , 2004 .

[70]  Qiang Liu,et al.  Stein Variational Gradient Descent as Gradient Flow , 2017, NIPS.

[71]  A. Berlinet,et al.  Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[72]  Dilin Wang,et al.  Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm , 2016, NIPS.

[73]  Sudheesh K. Kattumannil,et al.  On Stein's identity and its applications , 2009 .

[74]  Richard L. Tweedie,et al.  Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.

[75]  S. Betsch,et al.  A new characterization of the Gamma distribution and associated goodness-of-fit tests , 2018, Metrika.

[76]  Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring , 2012 .

[77]  Laurent Schwartz,et al.  Sous-espaces hilbertiens d’espaces vectoriels topologiques et noyaux associés (Noyaux reproduisants) , 1964 .

[78]  Guy Van den Broeck,et al.  Off-Policy Deep Reinforcement Learning with Analogous Disentangled Exploration , 2020, AAMAS.

[79]  Sourav Chatterjee,et al.  A short survey of Stein's method , 2014, 1404.1392.

[80]  Qiang Liu,et al.  Stein Variational Adaptive Importance Sampling , 2017, UAI.

[81]  Yoshua Bengio,et al.  Bayesian Model-Agnostic Meta-Learning , 2018, NeurIPS.

[82]  M. Waterman,et al.  Distributional regimes for the number of k-word matches between two random sequences , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[83]  Lester W. Mackey,et al.  Multivariate Stein Factors for a Class of Strongly Log-concave Distributions , 2015, 1512.07392.

[84]  Richard Zemel,et al.  Learning the Stein Discrepancy for Training and Evaluating Energy-Based Models without Sampling , 2020, ICML.

[85]  B. Øksendal Stochastic differential equations : an introduction with applications , 1987 .

[86]  Guang Cheng,et al.  Stein Neural Sampler , 2018, ArXiv.

[87]  Ivan Nourdin,et al.  Stein’s method, logarithmic Sobolev and transport inequalities , 2014, Geometric and Functional Analysis.

[88]  Andreas Anastasiou Bounds for the normal approximation of the maximum likelihood estimator , 2014, 1411.2391.

[89]  Matthew M. Graham,et al.  Measure Transport with Kernel Stein Discrepancy , 2020, AISTATS.

[90]  Lester W. Mackey,et al.  Stochastic Stein Discrepancies , 2020, NeurIPS.

[91]  M. Girolami,et al.  Convergence rates for a class of estimators based on Stein’s method , 2016, Bernoulli.

[92]  Bounds for the asymptotic normality of the maximum likelihood estimator using the Delta method , 2015, 1508.04948.

[93]  Jos'e Miguel Hern'andez-Lobato,et al.  Sliced Kernelized Stein Discrepancy , 2020, ICLR.

[94]  Sergey Levine,et al.  Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.

[95]  A. Barbour Stein's method and poisson process convergence , 1988, Journal of Applied Probability.

[96]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[97]  G. Reinert,et al.  Bounds for the asymptotic distribution of the likelihood ratio , 2018, The Annals of Applied Probability.

[98]  Mark Girolami,et al.  Semi-Exact Control Functionals From Sard’s Method , 2020, Biometrika.

[99]  Variance reduction for MCMC methods via martingale representations , 2019, 1903.07373.

[100]  B. Simon,et al.  Adaptive simulation using perfect control variates , 2004, Journal of Applied Probability.

[101]  Dustin Tran,et al.  Operator Variational Inference , 2016, NIPS.

[102]  Aapo Hyvärinen,et al.  Estimation of Non-Normalized Statistical Models by Score Matching , 2005, J. Mach. Learn. Res..

[103]  Chang Liu,et al.  Riemannian Stein Variational Gradient Descent for Bayesian Inference , 2017, AAAI.

[104]  Max Welling,et al.  Austerity in MCMC Land: Cutting the Metropolis-Hastings Budget , 2013, ICML 2014.

[105]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[106]  Nicholas Zabaras,et al.  Bayesian Deep Convolutional Encoder-Decoder Networks for Surrogate Modeling and Uncertainty Quantification , 2018, J. Comput. Phys..

[107]  Susan Holmes,et al.  Stein's Method: Expository Lectures and Applications , 2004 .

[108]  Lester Mackey,et al.  Random Feature Stein Discrepancies , 2018, NeurIPS.

[109]  Q. Shao Stein's method, self-normalized limit theory and applications , 2011 .

[110]  Sigrún Andradóttir,et al.  Variance reduction through smoothing and control variates for Markov chain simulations , 1993, TOMC.

[111]  Xin Zhang,et al.  Seismic Tomography Using Variational Inference Methods , 2019, Journal of Geophysical Research: Solid Earth.

[112]  Alessandro Barp,et al.  Minimum Stein Discrepancy Estimators , 2019, NeurIPS.

[113]  Louis H. Y. Chen Poisson Approximation for Dependent Trials , 1975 .

[114]  Chang Liu,et al.  Understanding and Accelerating Particle-Based Variational Inference , 2018, ICML.

[115]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[116]  Nathan Ross Fundamentals of Stein's method , 2011, 1109.1880.

[117]  Zhaoran Wang,et al.  Learning non-Gaussian multi-index model via second-order Stein's method , 2017, NIPS 2017.

[118]  G. Reinert Three general approaches to Stein's method , 2005 .

[119]  Gesine Reinert,et al.  Distances between nested densities and a measure of the impact of the prior in Bayesian statistics , 2015 .

[120]  Lei Li,et al.  A stochastic version of Stein Variational Gradient Descent for efficient sampling , 2019, Communications in Applied Mathematics and Computational Science.

[121]  Mark Girolami,et al.  The Controlled Thermodynamic Integral for Bayesian Model Comparison , 2014, 1404.5053.

[122]  N. Henze,et al.  Tests for multivariate normality—a critical review with emphasis on weighted $$L^2$$-statistics , 2020, TEST.

[123]  Peng Chen,et al.  Projected Stein Variational Newton: A Fast and Scalable Bayesian Inference Method in High Dimensions , 2019, NeurIPS.

[124]  Dilin Wang,et al.  Stein Variational Message Passing for Continuous Graphical Models , 2017, ICML.

[125]  Ludwig Baringhaus,et al.  A goodness of fit test for the Poisson distribution based on the empirical generating function , 1992 .

[126]  Gesine Reinert,et al.  A Stein Goodness of fit Test for Exponential Random Graph Models , 2021, 2103.00580.

[127]  Andreas Anastasiou,et al.  Wasserstein distance error bounds for the multivariate normal approximation of the maximum likelihood estimator , 2020, Electronic Journal of Statistics.

[128]  Alpha A. Lee,et al.  Bayesian semi-supervised learning for uncertainty-calibrated prediction of molecular properties and active learning , 2019, Chemical science.

[129]  Zhuo Sun,et al.  Vector-Valued Control Variates , 2021 .

[130]  G. Reinert,et al.  Stein's method for comparison of univariate distributions , 2014, 1408.2998.

[131]  Tyler Maunu,et al.  SVGD as a kernelized Wasserstein gradient flow of the chi-squared divergence , 2020, NeurIPS.

[132]  C. Stein A bound for the error in the normal approximation to the distribution of a sum of dependent random variables , 1972 .

[133]  N. Henze,et al.  Goodness-of-Fit Tests for the Gamma Distribution Based on the Empirical Laplace Transform , 2012 .

[134]  B. Ebner On combining the zero bias transform and the empirical characteristic function to test normality , 2020, 2002.12085.

[135]  Lester W. Mackey,et al.  Stein Points , 2018, ICML.

[136]  Q. Shao,et al.  Stein’s method for nonlinear statistics: A brief survey and recent progress , 2016 .

[137]  D. Belomestny,et al.  Variance reduction via empirical variance minimization: convergence and complexity , 2017 .

[138]  Bai Li,et al.  A Unified Particle-Optimization Framework for Scalable Bayesian Sampling , 2018, UAI.

[139]  Arthur Gretton,et al.  Composite Goodness-of-fit Tests with Kernels , 2021, ArXiv.

[140]  Xin Zhang,et al.  Variational full-waveform inversion , 2020, Geophysical Journal International.

[141]  C. Stein,et al.  Estimation with Quadratic Loss , 1992 .

[142]  C. Stein Approximate computation of expectations , 1986 .

[143]  C. Stein Estimation of the Mean of a Multivariate Normal Distribution , 1981 .

[144]  Robert E. Gaunt,et al.  Chi-square approximation by Stein's method with application to Pearson's statistic , 2015, 1507.01707.

[145]  Vinayak A. Rao,et al.  A Stein-Papangelou Goodness-of-Fit Test for Point Processes , 2019, AISTATS.

[146]  Arthur Gretton,et al.  A Kernel Test of Goodness of Fit , 2016, ICML.

[147]  Wenguang Sun,et al.  A General Framework for Empirical Bayes Estimation in Discrete Linear Exponential Family , 2019, J. Mach. Learn. Res..

[148]  G. Reinert,et al.  Stein's Method for the Beta Distribution and the Pólya-Eggenberger Urn , 2013, Journal of Applied Probability.

[149]  P. Dellaportas,et al.  Control variates for estimation based on reversible Markov chain Monte Carlo samplers , 2012 .

[150]  C. Carmeli,et al.  Vector valued reproducing kernel Hilbert spaces and universality , 2008, 0807.1659.

[151]  S. Betsch,et al.  Fixed point characterizations of continuous univariate probability distributions and their applications , 2018 .

[152]  Zhanxing Zhu,et al.  Neural Control Variates for Variance Reduction , 2018, ArXiv.

[153]  Gesine Reinert,et al.  Couplings for normal approximations with Stein's method , 1997, Microsurveys in Discrete Probability.

[154]  Christophe Ley,et al.  Parametric Stein operators and variance bounds , 2013, 1305.5067.

[155]  Qiang Liu,et al.  Stein Variational Gradient Descent Without Gradient , 2018, ICML.

[156]  Ning Chen,et al.  Message Passing Stein Variational Gradient Descent , 2017, ICML.

[157]  Jianfeng Lu,et al.  Scaling Limit of the Stein Variational Gradient Descent: The Mean Field Regime , 2018, SIAM J. Math. Anal..

[158]  A. Müller Integral Probability Metrics and Their Generating Classes of Functions , 1997, Advances in Applied Probability.

[159]  Anne Leucht,et al.  Dependent wild bootstrap for degenerate U- and V-statistics , 2013, J. Multivar. Anal..

[160]  Nikolas Nüsken,et al.  Stein Variational Gradient Descent: many-particle and long-time asymptotics , 2021, Foundations of Data Science.

[161]  G. Reinert,et al.  Distributional Transformations, Orthogonal Polynomials, and Stein Characterizations , 2005, math/0510240.

[162]  G. Peccati,et al.  Normal Approximations with Malliavin Calculus: From Stein's Method to Universality , 2012 .

[163]  Martin Raič,et al.  Normal Approximation by Stein ’ s Method , 2003 .

[164]  Dilin Wang,et al.  Nonlinear Stein Variational Gradient Descent for Learning Diversified Mixture Models , 2019, ICML.

[165]  A. Mijatović,et al.  On the Poisson equation for Metropolis–Hastings chains , 2015, Bernoulli.

[166]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .