Stein’s Method Meets Statistics: A Review of Some Recent Developments

Stein’s method is a collection of tools for analysing distributional comparisons through the study of a class of linear operators called Stein operators. Originally studied in probability, Stein’s method has also enabled some important developments in statistics. This early success has led to a high research activity in this area in recent years. The goal of this survey is to bring together some of these developments in theoretical statistics as well as in computational statistics and, in doing so, to stimulate further research into the successful field of Stein’s method and statistics. The topics we discuss include: explicit error bounds for asymptotic approximations of estimators and test statistics, a measure of prior sensitivity in Bayesian statistics, tools to benchmark and compare sampling methods such as approximate Markov chain Monte Carlo, deterministic alternatives to sampling methods, control variate techniques, and goodness-of-fit testing.

[1]  C. Carmeli,et al.  Vector valued reproducing kernel Hilbert spaces and universality , 2008, 0807.1659.

[2]  Guang Cheng,et al.  Stein Neural Sampler , 2018, ArXiv.

[3]  J. K. Yarnold Asymptotic Approximations for the Probability that a Sum of Lattice Random Vectors Lies in a Convex Set , 1972 .

[4]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[5]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[6]  T. C. Brown,et al.  Stein's Method and Birth-Death Processes , 2001 .

[7]  Lester W. Mackey,et al.  Stochastic Stein Discrepancies , 2020, NeurIPS.

[8]  Robert E. Gaunt,et al.  Stein operators for product distributions , 2016 .

[9]  Gesine Reinert,et al.  A Stein Goodness of fit Test for Exponential Random Graph Models , 2021, 2103.00580.

[10]  F. Götze On the Rate of Convergence in the Multivariate CLT , 1991 .

[11]  C. Stein A bound for the error in the normal approximation to the distribution of a sum of dependent random variables , 1972 .

[12]  Lester W. Mackey,et al.  Measuring Sample Quality with Stein's Method , 2015, NIPS.

[13]  C. Stein,et al.  Estimation with Quadratic Loss , 1992 .

[14]  Lihu Xu Approximation of stable law in Wasserstein-1 distance by Stein’s method , 2017, The Annals of Applied Probability.

[15]  Norbert Henze,et al.  Invariant tests for multivariate normality: a critical review , 2002 .

[16]  G. Reinert,et al.  Relaxing the Gaussian assumption in shrinkage and SURE in high dimension , 2020, The Annals of Statistics.

[17]  Q. Shao,et al.  Multivariate approximations in Wasserstein distance by Stein’s method and Bismut’s formula , 2018, Probability Theory and Related Fields.

[18]  L. Goldstein,et al.  Dickman approximation in simulation, summations and perpetuities , 2017, Bernoulli.

[19]  S. S. Wilks The Large-Sample Distribution of the Likelihood Ratio for Testing Composite Hypotheses , 1938 .

[20]  C. Stein Approximate computation of expectations , 1986 .

[21]  Nathan Ross,et al.  Joint degree distributions of preferential attachment random graphs , 2014, Advances in Applied Probability.

[22]  G. Peccati,et al.  Normal Approximations with Malliavin Calculus: From Stein's Method to Universality , 2012 .

[23]  Quantification of the impact of priors in Bayesian statistics via Stein’s Method , 2019, Statistics & Probability Letters.

[24]  S. Betsch,et al.  Fixed point characterizations of continuous univariate probability distributions and their applications , 2018 .

[25]  Qiang Liu,et al.  Stein Variational Gradient Descent Without Gradient , 2018, ICML.

[26]  Qiang Liu,et al.  Stein Variational Gradient Descent as Gradient Flow , 2017, NIPS.

[27]  Gesine Reinert,et al.  Alignment-Free Sequence Comparison (I): Statistics and Power , 2009, J. Comput. Biol..

[28]  Liam Hodgkinson,et al.  The reproducing Stein kernel approach for post-hoc corrected sampling , 2020, 2001.09266.

[29]  C. Stein Estimation of the Mean of a Multivariate Normal Distribution , 1981 .

[30]  Kenji Fukumizu,et al.  A Linear-Time Kernel Goodness-of-Fit Test , 2017, NIPS.

[31]  Chang Liu,et al.  Understanding and Accelerating Particle-Based Variational Inference , 2018, ICML.

[32]  Martin Raič,et al.  Normal Approximation by Stein ’ s Method , 2003 .

[33]  Robert E. Gaunt,et al.  Chi-square approximation by Stein's method with application to Pearson's statistic , 2015, 1507.01707.

[34]  D. Belomestny,et al.  Variance reduction for Markov chains with application to MCMC , 2020, Stat. Comput..

[35]  Mark A. Girolami,et al.  Geometry and Dynamics for Markov Chain Monte Carlo , 2017, ArXiv.

[36]  Richard H. Liang Stein ’ s method for concentration inequalities , 2007 .

[37]  G. Reinert,et al.  Approximating stationary distributions of fast mixing Glauber dynamics, with applications to exponential random graphs , 2017, The Annals of Applied Probability.

[38]  Xin Zhang,et al.  Seismic Tomography Using Variational Inference Methods , 2019, Journal of Geophysical Research: Solid Earth.

[39]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[40]  B. Ebner On combining the zero bias transform and the empirical characteristic function to test normality , 2020, 2002.12085.

[41]  Dilin Wang,et al.  Nonlinear Stein Variational Gradient Descent for Learning Diversified Mixture Models , 2019, ICML.

[42]  A. Müller Integral Probability Metrics and Their Generating Classes of Functions , 1997, Advances in Applied Probability.

[43]  Elizabeth S. Meckes,et al.  On Stein's method for multivariate normal approximation , 2009, 0902.0333.

[44]  Q. Shao,et al.  Cramér type moderate deviation theorems for self-normalized processes , 2014, 1405.1218.

[45]  L. Goldstein,et al.  Gaussian Phase Transitions and Conic Intrinsic Volumes: Steining the Steiner Formula , 2014, 1411.6265.

[46]  Franccois-Xavier Briol,et al.  Stein Point Markov Chain Monte Carlo , 2019, ICML.

[47]  Arthur Gretton,et al.  A Non-Asymptotic Analysis for Stein Variational Gradient Descent , 2020, NeurIPS.

[48]  Louis H. Y. Chen Poisson Approximation for Dependent Trials , 1975 .

[49]  N. Henze,et al.  Tests for multivariate normality—a critical review with emphasis on weighted $$L^2$$-statistics , 2020, TEST.

[50]  G. Reinert,et al.  Stein's method and the zero bias transformation with application to simple random sampling , 1997, math/0510619.

[51]  Alpha A. Lee,et al.  Bayesian semi-supervised learning for uncertainty-calibrated prediction of molecular properties and active learning , 2019, Chemical science.

[52]  Fraser Daly Upper bounds for Stein-type operators , 2008 .

[53]  A new test of multivariate normality by a double estimation in a characterizing PDE , 2019, 1911.10955.

[54]  Leah F. South,et al.  Regularised Zero-Variance Control Variates for High-Dimensional Variance Reduction , 2018 .

[55]  Q. Shao,et al.  Stein’s method for nonlinear statistics: A brief survey and recent progress , 2016 .

[56]  Aihua Xia,et al.  Multivariate approximation in total variation, I: equilibrium distributions of Markov jump processes , 2015 .

[57]  Qiang Liu,et al.  Stein Variational Gradient Descent as Moment Matching , 2018, NeurIPS.

[58]  Mark Girolami,et al.  The Controlled Thermodynamic Integral for Bayesian Model Comparison , 2014, 1404.5053.

[59]  Bounds for the asymptotic normality of the maximum likelihood estimator using the Delta method , 2015, 1508.04948.

[60]  Qiang Liu,et al.  Black-box Importance Sampling , 2016, AISTATS.

[61]  Gesine Reinert,et al.  Distances between nested densities and a measure of the impact of the prior in Bayesian statistics , 2015 .

[62]  Rory A. Fisher,et al.  Theory of Statistical Estimation , 1925, Mathematical Proceedings of the Cambridge Philosophical Society.

[63]  Thomas A. Courtade,et al.  Existence of Stein Kernels under a Spectral Gap, and Discrepancy Bound , 2017, Annales de l'Institut Henri Poincaré, Probabilités et Statistiques.

[64]  Stein characterizations for linear combinations of gamma random variables , 2017, 1709.01161.

[65]  Dilin Wang,et al.  Stein Variational Message Passing for Continuous Graphical Models , 2017, ICML.

[66]  I. Pinelis,et al.  Optimal-order bounds on the rate of convergence to normality in the multivariate delta method , 2009, 0906.0177.

[67]  J. Neyman,et al.  INADMISSIBILITY OF THE USUAL ESTIMATOR FOR THE MEAN OF A MULTIVARIATE NORMAL DISTRIBUTION , 2005 .

[68]  Sourav Chatterjee,et al.  A short survey of Stein's method , 2014, 1404.1392.

[69]  S. Chatterjee,et al.  Applications of Stein's method for concentration inequalities , 2009, 0906.1034.

[70]  Ing Rj Ser Approximation Theorems of Mathematical Statistics , 1980 .

[71]  Multivariate approximation in total variation, II: Discrete normal approximation , 2016, 1612.07519.

[72]  A. Barbour Stein's method and poisson process convergence , 1988, Journal of Applied Probability.

[73]  Ludwig Baringhaus,et al.  A goodness of fit test for the Poisson distribution based on the empirical generating function , 1992 .

[74]  Anne Leucht,et al.  Dependent wild bootstrap for degenerate U- and V-statistics , 2013, J. Multivar. Anal..

[75]  Nathan Ross Fundamentals of Stein's method , 2011, 1109.1880.

[76]  Distributional Transformations, Orthogonal Polynomials, and Stein Characterizations , 2005, math/0510240.

[77]  A. Mijatović,et al.  On the Poisson equation for Metropolis–Hastings chains , 2015, Bernoulli.

[78]  Christophe Ley,et al.  Parametric Stein operators and variance bounds , 2013, 1305.5067.

[79]  Jun Zhu,et al.  A Spectral Approach to Gradient Estimation for Implicit Distributions , 2018, ICML.

[80]  N. Henze,et al.  Goodness-of-Fit Tests for the Gamma Distribution Based on the Empirical Laplace Transform , 2012 .

[81]  Jos'e Miguel Hern'andez-Lobato,et al.  Sliced Kernelized Stein Discrepancy , 2020, ICLR.

[82]  L. Carin,et al.  Scalable Control Variates for Monte Carlo Methods via Stochastic Optimization , 2020, MCQMC.

[83]  A. Barbour Stein's method for diffusion approximations , 1990 .

[84]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[85]  Multivariate normal approximation of the maximum likelihood estimator via the delta method , 2016, 1609.03970.

[86]  Lester W. Mackey,et al.  Stein Points , 2018, ICML.

[87]  Robert E. Gaunt Rates of Convergence in Normal Approximation Under Moment Conditions Via New Bounds on Solutions of the Stein Equation , 2013, 1311.6954.

[88]  M. Caffarel,et al.  Zero-Variance Principle for Monte Carlo Algorithms , 1999, cond-mat/9911396.

[89]  James Thompson Approximation of Riemannian measures by Stein's method , 2020, 2001.09910.

[90]  Sigrún Andradóttir,et al.  Variance reduction through smoothing and control variates for Markov chain simulations , 1993, TOMC.

[91]  Peng Chen,et al.  Projected Stein Variational Newton: A Fast and Scalable Bayesian Inference Method in High Dimensions , 2019, NeurIPS.

[92]  Gesine Reinert,et al.  Stein's density method for multivariate continuous distributions , 2021 .

[93]  Lester W. Mackey,et al.  Measuring Sample Quality with Diffusions , 2016, The Annals of Applied Probability.

[94]  Murat A. Erdogdu,et al.  Flexible results for quadratic forms with applications to variance components estimation , 2015, 1509.04388.

[95]  Bounds for the normal approximation of the maximum likelihood estimator from m-dependent random variables , 2016 .

[96]  Richard E. Turner,et al.  Gradient Estimators for Implicit Models , 2017, ICLR.

[97]  Susan P. Holmes Stein’s method for birth and death chains , 2004 .

[98]  Andreas Anastasiou Assessing the multivariate normal approximation of the maximum likelihood estimator from high-dimensional, heterogeneous data , 2015, 1510.03679.

[99]  Chang Liu,et al.  Riemannian Stein Variational Gradient Descent for Bayesian Inference , 2017, AAAI.

[100]  Vinayak A. Rao,et al.  A Stein-Papangelou Goodness-of-Fit Test for Point Processes , 2019, AISTATS.

[101]  Sergey Levine,et al.  Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.

[102]  D. Belomestny,et al.  Variance reduction via empirical variance minimization: convergence and complexity , 2017 .

[103]  Zhaoran Wang,et al.  Learning non-Gaussian multi-index model via second-order Stein's method , 2017, NIPS 2017.

[104]  Tyler Maunu,et al.  SVGD as a kernelized Wasserstein gradient flow of the chi-squared divergence , 2020, NeurIPS.

[105]  Qiang Liu,et al.  Stein Variational Gradient Descent With Matrix-Valued Kernels , 2019, NeurIPS.

[106]  N. Henze,et al.  Testing for normality in any dimension based on a partial differential equation involving the moment generating function , 2019, Annals of the Institute of Statistical Mathematics.

[107]  Takafumi Kanamori,et al.  Fisher Efficient Inference of Intractable Models , 2018, NeurIPS.

[108]  J. Tropp,et al.  Efron–Stein inequalities for random matrices , 2014, 1408.3470.

[109]  Dustin Tran,et al.  Operator Variational Inference , 2016, NIPS.

[110]  Takeru Matsuda,et al.  A Stein Goodness-of-fit Test for Directional Distributions , 2020, AISTATS.

[111]  Andreas Anastasiou Bounds for the normal approximation of the maximum likelihood estimator , 2014, 1411.2391.

[112]  P. Diaconis,et al.  Closed Form Summation for Classical Distributions: Variations on Theme of De Moivre , 1991 .

[113]  Yoav Zemel,et al.  Statistical Aspects of Wasserstein Distances , 2018, Annual Review of Statistics and Its Application.

[114]  C. Houdr'e,et al.  On Stein's Method for Infinitely Divisible Laws with Finite First Moment , 2017, SpringerBriefs in Probability and Mathematical Statistics.

[115]  G. Reinert,et al.  Bounds for the asymptotic distribution of the likelihood ratio , 2018, The Annals of Applied Probability.

[116]  F. Nestmann,et al.  Characterizations of non-normalized discrete probability distributions and their application in statistics , 2020, Electronic Journal of Statistics.

[117]  C. Dobler Stein's method of exchangeable pairs for the Beta distribution and generalizations , 2014, 1411.4477.

[118]  Gesine Reinert,et al.  Stein's Method for the Beta Distribution and the Pólya-Eggenberger Urn , 2013, J. Appl. Probab..

[119]  Richard L. Tweedie,et al.  Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.

[120]  Dilin Wang,et al.  Learning to Draw Samples with Amortized Stein Variational Gradient Descent , 2017, UAI.

[121]  Lester W. Mackey,et al.  Measuring Sample Quality with Kernels , 2017, ICML.

[122]  Franccois-Xavier Briol,et al.  Robust generalised Bayesian inference for intractable likelihoods , 2021, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[123]  Arthur Gretton,et al.  Kernelized Stein Discrepancy Tests of Goodness-of-fit for Time-to-Event Data , 2020, ICML.

[124]  P. Dellaportas,et al.  Control variates for estimation based on reversible Markov chain Monte Carlo samplers , 2012 .

[125]  Q. Shao An Explicit Berry-Esseen Bound for Student's t-Statistic Via Stein's Method , 2005 .

[126]  Laurent Schwartz,et al.  Sous-espaces hilbertiens d’espaces vectoriels topologiques et noyaux associés (Noyaux reproduisants) , 1964 .

[127]  Nikolas Nüsken,et al.  Stein Variational Gradient Descent: many-particle and long-time asymptotics , 2021, Foundations of Data Science.

[128]  Ludwig Baringhaus,et al.  A class of consistent tests for exponentiality based on the empirical Laplace transform , 1991 .

[129]  Junyong Park,et al.  Stein's method in high dimensional classification and applications , 2015, Comput. Stat. Data Anal..

[130]  Yoshua Bengio,et al.  Bayesian Model-Agnostic Meta-Learning , 2018, NeurIPS.

[131]  Michael I. Jordan,et al.  Matrix concentration inequalities via the method of exchangeable pairs , 2012, 1201.6002.

[132]  Marina Riabiz,et al.  Optimal Quantisation of Probability Measures Using Maximum Mean Discrepancy , 2020, AISTATS.

[133]  Variational full-waveform inversion , 2020 .

[134]  Sudheesh K. Kattumannil,et al.  On Stein's identity and its applications , 2009 .

[135]  G. Reinert,et al.  Stein's method for comparison of univariate distributions , 2014, 1408.2998.

[136]  Alessandro Barp,et al.  Minimum Stein Discrepancy Estimators , 2019, NeurIPS.

[137]  Ahn,et al.  Bayesian posterior sampling via stochastic gradient Fisher scoring Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring , 2012 .

[138]  I. Pinelis Optimal-order uniform and nonuniform bounds on the rate of convergence to normality for maximum likelihood estimators , 2017 .

[139]  S. Betsch,et al.  Testing normality via a distributional fixed point property in the Stein characterization , 2018, TEST.

[140]  Quantitative CLTs for symmetric $U$-statistics using contractions , 2018, Electronic Journal of Probability.

[141]  G. Reinert,et al.  Stein’s method for discrete Gibbs measures , 2008, 0808.2877.

[142]  M. Girolami,et al.  Convergence rates for a class of estimators based on Stein’s method , 2016, Bernoulli.

[143]  Variance reduction for MCMC methods via martingale representations , 2019, 1903.07373.

[144]  Yang Liu,et al.  Stein Variational Policy Gradient , 2017, UAI.

[145]  Q. Shao Stein's method, self-normalized limit theory and applications , 2011 .

[146]  Max Welling,et al.  Austerity in MCMC Land: Cutting the Metropolis-Hastings Budget , 2013, ICML 2014.

[147]  Alison L Gibbs,et al.  On Choosing and Bounding Probability Metrics , 2002, math/0209021.

[148]  Qi-Man Shao,et al.  Berry–Esseen bounds for multivariate nonlinear statistics with applications to M-estimators and stochastic gradient descent algorithms , 2021, 2102.04923.

[149]  D. Freedman,et al.  On the consistency of Bayes estimates , 1986 .

[150]  Jianfeng Lu,et al.  Scaling Limit of the Stein Variational Gradient Descent: The Mean Field Regime , 2018, SIAM J. Math. Anal..

[151]  Le Song,et al.  A Hilbert Space Embedding for Distributions , 2007, Discovery Science.

[152]  P. Diaconis,et al.  Use of exchangeable pairs in the analysis of simulations , 2004 .

[153]  A. Duncan,et al.  On the geometry of Stein variational gradient descent , 2019, ArXiv.

[154]  Ron Goldman,et al.  Poisson approximation , 2000, Proceedings Geometric Modeling and Processing 2000. Theory and Applications.

[155]  Dengyong Zhou,et al.  Action-depedent Control Variates for Policy Optimization via Stein's Identity , 2017 .

[156]  Aihua Xia,et al.  Palm theory, random measures and Stein couplings , 2020, The Annals of Applied Probability.

[157]  G. Reinert,et al.  Stein’s method for the bootstrap , 2004 .

[158]  Lester W. Mackey,et al.  Multivariate Stein Factors for a Class of Strongly Log-concave Distributions , 2015, 1512.07392.

[159]  M. Waterman,et al.  Distributional regimes for the number of k-word matches between two random sequences , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[160]  N. Chopin,et al.  Control functionals for Monte Carlo integration , 2014, 1410.2392.

[161]  Gesine Reinert,et al.  A Weak Law of Large Numbers for Empirical Measures via Stein's Method , 1995 .

[162]  Ning Chen,et al.  Message Passing Stein Variational Gradient Descent , 2017, ICML.

[163]  Robert E. Gaunt On Stein's method for products of normal random variables and zero bias couplings , 2013, 1309.4344.

[164]  Qiang Liu,et al.  A Kernelized Stein Discrepancy for Goodness-of-fit Tests , 2016, ICML.

[165]  Qiang Liu,et al.  Stein Variational Adaptive Importance Sampling , 2017, UAI.

[166]  Dilin Wang,et al.  Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm , 2016, NIPS.

[167]  Q. Shao,et al.  Cramér-type moderate deviations for Studentized two-sample $U$-statistics with applications , 2014, 1407.4546.

[168]  Ohad Shamir,et al.  Global Non-convex Optimization with Discretized Diffusions , 2018, NeurIPS.

[169]  Richard Zemel,et al.  Learning the Stein Discrepancy for Training and Evaluating Energy-Based Models without Sampling , 2020, ICML.

[170]  Tiangang Cui,et al.  A Stein variational Newton method , 2018, NeurIPS.

[171]  B. Simon,et al.  Adaptive simulation using perfect control variates , 2004, Journal of Applied Probability.

[172]  Yvik Swan,et al.  On the rate of convergence in de Finetti's representation theorem , 2016, 1601.06606.

[173]  Gesine Reinert,et al.  Couplings for normal approximations with Stein's method , 1997, Microsurveys in Discrete Probability.

[174]  Krishnakumar Balasubramanian,et al.  Normal Approximation for Stochastic Gradient Descent via Non-Asymptotic Rates of Martingale CLT , 2019, COLT.

[175]  Ivan Nourdin,et al.  Stein’s method, logarithmic Sobolev and transport inequalities , 2014, Geometric and Functional Analysis.

[176]  Huiling Le,et al.  A diffusion approach to Stein's method on Riemannian manifolds , 2020, 2003.11497.

[177]  Nicholas Zabaras,et al.  Bayesian Deep Convolutional Encoder-Decoder Networks for Surrogate Modeling and Uncertainty Quantification , 2018, J. Comput. Phys..

[178]  Robert E. Gaunt,et al.  The rate of convergence of some asymptotically chi-square distributed statistics by Stein's method , 2016, 1603.01889.

[179]  Zhe Gan,et al.  VAE Learning via Stein Variational Gradient Descent , 2017, NIPS.

[180]  Guy Van den Broeck,et al.  Off-Policy Deep Reinforcement Learning with Analogous Disentangled Exploration , 2020, AAMAS.

[181]  Andreas Anastasiou,et al.  Wasserstein distance error bounds for the multivariate normal approximation of the maximum likelihood estimator , 2020, Electronic Journal of Statistics.

[182]  Qiang Liu,et al.  Goodness-of-fit Testing for Discrete Distributions via Stein Discrepancy , 2018, ICML.

[183]  Matthew M. Graham,et al.  Measure Transport with Kernel Stein Discrepancy , 2020, AISTATS.

[184]  S. Betsch,et al.  A new characterization of the Gamma distribution and associated goodness-of-fit tests , 2018, Metrika.

[185]  Arthur Gretton,et al.  A Kernel Test of Goodness of Fit , 2016, ICML.

[186]  Lester Mackey,et al.  Random Feature Stein Discrepancies , 2018, NeurIPS.

[187]  Lei Li,et al.  A stochastic version of Stein Variational Gradient Descent for efficient sampling , 2019, Communications in Applied Mathematics and Computational Science.

[188]  G. Reinert Three general approaches to Stein's method , 2005 .

[189]  Antonietta Mira,et al.  Zero variance Markov chain Monte Carlo for Bayesian estimators , 2010, Stat. Comput..

[190]  Bai Li,et al.  A Unified Particle-Optimization Framework for Scalable Bayesian Sampling , 2018, UAI.