Theoretical guarantees for neural control variates in MCMC

In this paper, we propose a variance reduction approach for Markov chains based on additive control variates and the minimization of an appropriate estimate for the asymptotic variance. We focus on the particular case when control variates are represented as deep neural networks. We derive the optimal convergence rate of the asymptotic variance under various ergodicity assumptions on the underlying Markov chain. The proposed approach relies upon recent results on the stochastic errors of variance reduction algorithms and function approximation theory.

[1]  C. Oates,et al.  Meta-learning Control Variates: Variance Reduction with Limited Data , 2023, UAI.

[2]  A. Naumov,et al.  Simultaneous approximation of a smooth function and its derivatives by deep neural networks with piecewise-polynomial activations , 2022, Neural Networks.

[3]  Siddhartha Mishra,et al.  On the approximation of functions by tanh neural networks , 2021, Neural Networks.

[4]  Jinchao Xu,et al.  High-Order Approximation Rates for Shallow Neural Networks with Cosine and ReLU Activation Functions , 2020, Applied and Computational Harmonic Analysis.

[5]  Sophie Langer,et al.  Approximating smooth functions by deep neural networks with sigmoid activation function , 2020, J. Multivar. Anal..

[6]  D. Belomestny,et al.  Variance Reduction for Dependent Sequences with Applications to Stochastic Gradient MCMC , 2020, SIAM/ASA J. Uncertain. Quantification.

[7]  P. Grohs,et al.  Deep neural network approximation for high-dimensional elliptic PDEs with boundary conditions , 2020, IMA Journal of Numerical Analysis.

[8]  Mones Raslan,et al.  Approximation Rates for Neural Networks with Encodable Weights in Smoothness Spaces , 2020, Neural Networks.

[9]  L. Carin,et al.  Scalable Control Variates for Monte Carlo Methods via Stochastic Optimization , 2020, MCQMC.

[10]  É. Moulines,et al.  Variance reduction for Markov chains with application to MCMC , 2019, Statistics and Computing.

[11]  Haijun Yu,et al.  PowerNet: Efficient Representations of Polynomials and Smooth Functions by Deep Neural Networks with Rectified Power Units , 2019, Journal of Mathematical Study.

[12]  Bo Li,et al.  Better Approximations of High Dimensional Smooth Functions by Deep Neural Networks with Rectified Power Units , 2019, Communications in Computational Physics.

[13]  M. Lema'nczyk General Bernstein-Like Inequality for Additive Functionals of Markov Chains , 2018, Journal of Theoretical Probability.

[14]  Nikita Zhivotovskiy,et al.  Variance Reduction in Monte Carlo Estimators via Empirical Variance Minimization , 2018, Doklady Mathematics.

[15]  S. V. Shaposhnikov,et al.  The Poisson Equation and Estimates for Distances Between Stationary Distributions of Diffusions , 2018, Journal of Mathematical Sciences.

[16]  Nikita Zhivotovskiy,et al.  Empirical variance minimization with applications in variance reduction and optimal control , 2017, Bernoulli.

[17]  Johannes Schmidt-Hieber,et al.  Nonparametric regression using deep neural networks with ReLU activation function , 2017, The Annals of Statistics.

[18]  Arnulf Jentzen,et al.  Solving high-dimensional partial differential equations using deep learning , 2017, Proceedings of the National Academy of Sciences.

[19]  M. Girolami,et al.  Convergence rates for a class of estimators based on Stein’s method , 2016, Bernoulli.

[20]  N. Chopin,et al.  Control functionals for Monte Carlo integration , 2014, 1410.2392.

[21]  R. Adamczak,et al.  Exponential concentration inequalities for additive functionals of Markov chains , 2012, 1201.3569.

[22]  P. Dellaportas,et al.  Control variates for estimation based on reversible Markov chain Monte Carlo samplers , 2012 .

[23]  Antonietta Mira,et al.  Zero variance Markov chain Monte Carlo for Bayesian estimators , 2010, Stat. Comput..

[24]  James M. Flegal,et al.  Batch means and spectral variance estimators in Markov chain Monte Carlo , 2008, 0811.1729.

[25]  Galin L. Jones On the Markov chain central limit theorem , 2004, math/0409112.

[26]  J. Rosenthal,et al.  General state space Markov chains and MCMC algorithms , 2004, math/0404033.

[27]  Hoon Kim,et al.  Monte Carlo Statistical Methods , 2000, Technometrics.

[28]  M. Caffarel,et al.  Zero-Variance Principle for Monte Carlo Algorithms , 1999, cond-mat/9911396.

[29]  Reuven Y. Rubinstein,et al.  Simulation and the Monte Carlo method , 1981, Wiley series in probability and mathematical statistics.

[30]  P. Bassanini,et al.  Elliptic Partial Differential Equations of Second Order , 1997 .

[31]  A. Friedman Partial Differential Equations of Parabolic Type , 1983 .

[32]  D. Vere-Jones Markov Chains , 1972, Nature.

[33]  C. B. Morrey Multiple Integrals in the Calculus of Variations , 1966 .