Semi-Exact Control Functionals From Sard’s Method

This paper focuses on the numerical computation of posterior expected quantities of interest, where existing approaches based on ergodic averages are gated by the asymptotic variance of the integrand. To address this challenge, a novel variance reduction technique is proposed, based on Sard's approach to numerical integration and the control functional method. The use of Sard's approach ensures that our control functionals are exact on all polynomials up to a fixed degree in the Bernstein-von-Mises limit, so that the reduced variance estimator approximates the behaviour of a polynomially-exact (e.g. Gaussian) cubature method. The proposed estimator has reduced mean square error compared to its competitors, and is illustrated on several Bayesian inference examples. All methods used in this paper are available in the R package ZVCV.

[1]  Lorenzo Rosasco,et al.  Less is More: Nyström Computational Regularization , 2015, NIPS.

[2]  D. Ermak A computer simulation of charged particles in solution. I. Technique and equilibrium properties , 1975 .

[3]  F. M. Larkin Optimal approximation in Hilbert spaces with reproducing kernel functions , 1970 .

[4]  C. W. Clenshaw,et al.  A method for numerical integration on an automatic computer , 1960 .

[5]  Mark Girolami,et al.  The Controlled Thermodynamic Integral for Bayesian Model Evidence Evaluation , 2016 .

[6]  Lester W. Mackey,et al.  Multivariate Stein Factors for a Class of Strongly Log-concave Distributions , 2015, 1512.07392.

[7]  C. Stein A bound for the error in the normal approximation to the distribution of a sum of dependent random variables , 1972 .

[8]  Michael A. Osborne,et al.  Probabilistic Integration: A Role in Statistical Computation? , 2015, Statistical Science.

[9]  N. Chopin,et al.  Control functionals for Monte Carlo integration , 2014, 1410.2392.

[10]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[11]  D. Ermak A computer simulation of charged particles in solution. II. Polyion diffusion coefficient , 1975 .

[12]  W. Gautschi Orthogonal Polynomials: Computation and Approximation , 2004 .

[13]  Rajesh Ranganath,et al.  Kernelized Complete Conditional Stein Discrepancy , 2019, ArXiv.

[14]  G. Wahba Spline models for observational data , 1990 .

[15]  Eric Moulines,et al.  Diffusion Approximations and Control Variates for MCMC , 2018, Computational Mathematics and Mathematical Physics.

[16]  Frances Y. Kuo,et al.  High-dimensional integration: The quasi-Monte Carlo way*† , 2013, Acta Numerica.

[17]  Qiang Liu,et al.  A Kernelized Stein Discrepancy for Goodness-of-fit Tests , 2016, ICML.

[18]  Simo Särkkä,et al.  A Bayes-Sard Cubature Method , 2018, NeurIPS.

[19]  Richard L. Tweedie,et al.  Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.

[20]  Arthur Gretton,et al.  A Kernel Test of Goodness of Fit , 2016, ICML.

[21]  Terrence J. Sejnowski,et al.  Analysis of hidden units in a layered network trained to classify sonar targets , 1988, Neural Networks.

[22]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[23]  Antonietta Mira,et al.  Zero variance Markov chain Monte Carlo for Bayesian estimators , 2010, Stat. Comput..

[24]  Mikhail Belkin,et al.  Reconciling modern machine-learning practice and the classical bias–variance trade-off , 2018, Proceedings of the National Academy of Sciences.

[25]  E FasshauerG Positive definite kernels: past, present and future , 2011 .

[26]  N. Higham COMPUTING A NEAREST SYMMETRIC POSITIVE SEMIDEFINITE MATRIX , 1988 .

[27]  James Ridgway,et al.  Leave Pima Indians alone: binary regression as a benchmark for Bayesian computation , 2015, 1506.08640.

[28]  Lester W. Mackey,et al.  Measuring Sample Quality with Kernels , 2017, ICML.

[29]  M. Girolami,et al.  A Riemannian-Stein Kernel method , 2018 .

[30]  A. O'Hagan,et al.  Bayes–Hermite quadrature , 1991 .

[31]  Lester W. Mackey,et al.  Measuring Sample Quality with Stein's Method , 2015, NIPS.

[32]  D. Belomestny,et al.  Variance reduction via empirical variance minimization: convergence and complexity , 2017 .

[33]  G. Fasshauer Positive definite kernels: past, present and future , 2011 .

[34]  F. M. Larkin Probabilistic Error Estimates in Spline Interpolation and Quadrature , 1974, IFIP Congress.

[35]  Antonietta Mira,et al.  Exploiting Multi-Core Architectures for Reduced-Variance Estimation with Intractable Likelihoods , 2014, 1408.4663.

[36]  G. Parisi Correlation functions and computer simulations (II) , 1981 .

[37]  Arnold Neumaier,et al.  Introduction to Numerical Analysis , 2001 .

[38]  Qiang Liu,et al.  Black-box Importance Sampling , 2016, AISTATS.

[39]  Liam Hodgkinson,et al.  The reproducing Stein kernel approach for post-hoc corrected sampling , 2020, 2001.09266.

[40]  Franccois-Xavier Briol,et al.  Stein Point Markov Chain Monte Carlo , 2019, ICML.

[41]  G. Marzolin Polygynie du Cincle plongeur (Cinclus cinclus) dans les côtes de Lorraine , 1988 .

[42]  L. Carin,et al.  Scalable Control Variates for Monte Carlo Methods via Stochastic Optimization , 2020, MCQMC.

[43]  Hoon Kim,et al.  Monte Carlo Statistical Methods , 2000, Technometrics.

[44]  Lester W. Mackey,et al.  Stein Points , 2018, ICML.

[45]  D. Belomestny,et al.  Variance reduction for Markov chains with application to MCMC , 2020, Stat. Comput..

[46]  Michael W. Mahoney,et al.  Fast Randomized Kernel Ridge Regression with Statistical Guarantees , 2015, NIPS.

[47]  A. Mijatović,et al.  On the Poisson equation for Metropolis–Hastings chains , 2015, Bernoulli.

[48]  M. Girolami,et al.  Convergence rates for a class of estimators based on Stein’s method , 2016, Bernoulli.

[49]  Brian D. Ripley,et al.  Stochastic Simulation , 2005 .

[50]  David R. Anderson,et al.  Modeling Survival and Testing Biological Hypotheses Using Marked Animals: A Unified Approach with Case Studies , 1992 .

[51]  Variance reduction for MCMC methods via martingale representations , 2019, 1903.07373.

[52]  Zhanxing Zhu,et al.  Neural Control Variates for Variance Reduction , 2018, ArXiv.

[53]  Leah F. South,et al.  Regularised Zero-Variance Control Variates for High-Dimensional Variance Reduction , 2018 .

[54]  R. Tweedie,et al.  Exponential convergence of Langevin distributions and their discrete approximations , 1996 .

[55]  M. Caffarel,et al.  Zero-Variance Principle for Monte Carlo Algorithms , 1999, cond-mat/9911396.

[56]  Rémi Bardenet,et al.  Monte Carlo Methods , 2013, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[57]  Lorenzo Rosasco,et al.  FALKON: An Optimal Large Scale Kernel Method , 2017, NIPS.

[58]  M. Girolami,et al.  A Riemann–Stein kernel method , 2018, Bernoulli.

[59]  A. Berlinet,et al.  Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[60]  A. Barbour Stein's method and poisson process convergence , 1988, Journal of Applied Probability.

[61]  M. Urner Scattered Data Approximation , 2016 .

[62]  Wittawat Jitkrittum,et al.  Large sample analysis of the median heuristic , 2017, 1707.07269.

[63]  A. Mira,et al.  Zero Variance Differential Geometric Markov Chain Monte Carlo Algorithms , 2014 .