Vector-Valued Control Variates

Control variates are post-processing tools for Monte Carlo estimators which can lead to significant variance reduction. This approach usually requires a large number of samples, which can be prohibitive for applications where sampling from a posterior or evaluating the integrand is computationally expensive. Furthermore, there are many scenarios where we need to compute multiple related integrals simultaneously or sequentially, which can further exacerbate computational costs. In this paper, we propose vector-valued control variates, an extension of control variates which can be used to reduce the variance of multiple integrals jointly. This allows the transfer of information across integration tasks, and hence reduces the overall requirement for a large number of samples. We focus on control variates based on kernel interpolants and our novel construction is obtained through a generalised Stein identity and the development of novel matrix-valued Stein reproducing kernels. We demonstrate our methodology on a range of problems including multifidelity modelling and model evidence computation through thermodynamic integration.

[1]  D. Belomestny,et al.  Variance reduction for Markov chains with application to MCMC , 2020, Stat. Comput..

[2]  A. O'Hagan,et al.  Bayes–Hermite quadrature , 1991 .

[3]  Michael A. Osborne,et al.  Probabilistic Integration: A Role in Statistical Computation? , 2015, Statistical Science.

[4]  Junier B. Oliva,et al.  Gaussian Process Optimisation with Multi-fidelity Evaluations , 2017 .

[5]  Mark A. Girolami,et al.  On the Sampling Problem for Kernel Quadrature , 2017, ICML.

[6]  E. D. de Jong,et al.  Post-Processing for MCMC , 2003 .

[7]  A. Mijatović,et al.  On the Poisson equation for Metropolis–Hastings chains , 2015, Bernoulli.

[8]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[9]  Mark A. Girolami,et al.  Bayesian Quadrature for Multiple Related Integrals , 2018, ICML.

[10]  M. Girolami,et al.  Control Functionals for Quasi-Monte Carlo Integration , 2015, AISTATS.

[11]  P. Dellaportas,et al.  Control variates for estimation based on reversible Markov chain Monte Carlo samplers , 2012 .

[12]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[13]  Lorenzo Rosasco,et al.  Convex Learning of Multiple Tasks and their Structure , 2015, ICML.

[14]  Neil D. Lawrence,et al.  Kernels for Vector-Valued Functions: a Review , 2011, Found. Trends Mach. Learn..

[15]  Qiang Liu,et al.  Stein Variational Gradient Descent With Matrix-Valued Kernels , 2019, NeurIPS.

[16]  Peter V. Gehler,et al.  Learning Output Kernels with Block Coordinate Descent , 2011, ICML.

[17]  Simo Särkkä,et al.  Symmetry exploits for Bayesian cubature methods , 2018, Statistics and Computing.

[18]  Christian P. Robert,et al.  Bayesian computation: a summary of the current state, and samples backwards and forwards , 2015, Statistics and Computing.

[19]  Dilin Wang,et al.  Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm , 2016, NIPS.

[20]  A. Pettitt,et al.  Marginal likelihood estimation via power posteriors , 2008 .

[21]  Antonietta Mira,et al.  Zero variance Markov chain Monte Carlo for Bayesian estimators , 2010, Stat. Comput..

[22]  Gesine Reinert,et al.  Stein’s Method Meets Statistics: A Review of Some Recent Developments , 2021 .

[23]  A. Owen,et al.  Control variates for quasi-Monte Carlo , 2005 .

[24]  M. Girolami,et al.  Riemann manifold Langevin and Hamiltonian Monte Carlo methods , 2011, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[25]  Stefan Heinrich,et al.  Multilevel Monte Carlo Methods , 2001, LSSC.

[26]  Shifeng Xiong,et al.  Sequential Design and Analysis of High-Accuracy and Low-Accuracy Computer Codes , 2013, Technometrics.

[27]  Lester W. Mackey,et al.  Measuring Sample Quality with Diffusions , 2016, The Annals of Applied Probability.

[28]  A. Mira,et al.  Zero Variance Differential Geometric Markov Chain Monte Carlo Algorithms , 2014 .

[29]  M. Girolami,et al.  Convergence rates for a class of estimators based on Stein’s method , 2016, Bernoulli.

[30]  Mark Girolami,et al.  A Unifying and Canonical Description of Measure-Preserving Diffusions , 2021, 2105.02845.

[31]  N. Chopin,et al.  Control functionals for Monte Carlo integration , 2014, 1410.2392.

[32]  Mark Girolami,et al.  Semi-Exact Control Functionals From Sard’s Method , 2020, Biometrika.

[33]  Zhanxing Zhu,et al.  Neural Control Variates for Variance Reduction , 2018, ArXiv.

[34]  Leah F. South,et al.  Regularised Zero-Variance Control Variates for High-Dimensional Variance Reduction , 2018 .

[35]  C. Carmeli,et al.  Vector valued reproducing kernel Hilbert spaces and universality , 2008, 0807.1659.

[36]  Javier González,et al.  Active Multi-Information Source Bayesian Quadrature , 2019, UAI.

[37]  M. Micheli,et al.  Matrix-valued kernels for shape deformation analysis , 2013, 1308.5739.

[38]  C. Carmeli,et al.  VECTOR VALUED REPRODUCING KERNEL HILBERT SPACES OF INTEGRABLE FUNCTIONS AND MERCER THEOREM , 2006 .

[39]  Mark A. Girolami,et al.  Estimating Bayes factors via thermodynamic integration and population MCMC , 2009, Comput. Stat. Data Anal..

[40]  Robert Scheichl,et al.  Multilevel Markov Chain Monte Carlo , 2019, SIAM Rev..

[41]  M. Caffarel,et al.  Zero-Variance Principle for Monte Carlo Algorithms , 1999, cond-mat/9911396.

[42]  Galin L. Jones On the Markov chain central limit theorem , 2004, math/0409112.

[43]  Benjamin Peherstorfer,et al.  Survey of multifidelity methods in uncertainty propagation, inference, and optimization , 2018, SIAM Rev..

[44]  Charles A. Micchelli,et al.  Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..

[45]  M. Girolami,et al.  A Riemannian-Stein Kernel method , 2018 .

[46]  L. Carin,et al.  Scalable Control Variates for Monte Carlo Methods via Stochastic Optimization , 2020, MCQMC.

[47]  Lester W. Mackey,et al.  Stein Points , 2018, ICML.

[48]  Jorge Nocedal,et al.  Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..

[49]  Charles A. Micchelli,et al.  On Learning Vector-Valued Functions , 2005, Neural Computation.

[50]  Nial Friel,et al.  Improving power posterior estimation of statistical evidence , 2012, Stat. Comput..

[51]  A. Berlinet,et al.  Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[52]  Mark Girolami,et al.  The Controlled Thermodynamic Integral for Bayesian Model Comparison , 2014, 1404.5053.