On Nesting Monte Carlo Estimators

Many problems in machine learning and statistics involve nested expectations and thus do not permit conventional Monte Carlo (MC) estimation. For such problems, one must nest estimators, such that terms in an outer estimator themselves involve calculation of a separate, nested, estimation. We investigate the statistical implications of nesting MC estimators, including cases of multiple levels of nesting, and establish the conditions under which they converge. We derive corresponding rates of convergence and provide empirical evidence that these rates are observed in practice. We further establish a number of pitfalls that can arise from naive nesting of MC estimators, provide guidelines about how these can be avoided, and lay out novel methods for reformulating certain classes of nested expectation problems into single expectations, leading to improved convergence rates. We demonstrate the applicability of our work by using our results to develop a new estimator for discrete Bayesian experimental design problems and derive error bounds for a class of variational objectives.

[1]  O. François,et al.  Approximate Bayesian Computation (ABC) in practice. , 2010, Trends in ecology & evolution.

[2]  Tom Rainforth,et al.  Nesting Probabilistic Programs , 2018, UAI.

[3]  A. O'Hagan,et al.  Bayes–Hermite quadrature , 1991 .

[4]  M. Chaplain,et al.  Mathematical modeling of tumor growth and treatment. , 2014, Current pharmaceutical design.

[5]  Joshua B. Tenenbaum,et al.  Church: a language for generative models , 2008, UAI.

[6]  Takashi Goda,et al.  Computing the variance of a conditional expectation via non-nested Monte Carlo , 2016, Oper. Res. Lett..

[7]  Tom Rainforth,et al.  Automating inference, learning, and design using probabilistic programming , 2017 .

[8]  Noah D. Goodman,et al.  A Dynamic Programming Algorithm for Inference in Recursive Probabilistic Programs , 2012, StarAI@UAI.

[9]  H. Wynn,et al.  Maximum entropy sampling and optimal Bayesian experimental design , 2000 .

[10]  Frank D. Wood,et al.  On the Pitfalls of Nested Monte Carlo , 2016, 1612.00951.

[11]  Francis A. Longstaff,et al.  Valuing American Options by Simulation: A Simple Least-Squares Approach , 2001 .

[12]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[13]  Sandeep Juneja,et al.  Nested Simulation in Portfolio Risk Measurement , 2008, Manag. Sci..

[14]  Ruslan Salakhutdinov,et al.  Importance Weighted Autoencoders , 2015, ICLR.

[15]  Zoubin Ghahramani,et al.  MCMC for Doubly-intractable Distributions , 2006, UAI.

[16]  Scott W. Linderman,et al.  Variational Sequential Monte Carlo , 2017, AISTATS.

[17]  Gerda Janssens,et al.  Nesting Probabilistic Inference , 2011, ArXiv.

[18]  Yee Whye Teh,et al.  Tighter Variational Bounds are Not Necessarily Better , 2018, ICML.

[19]  David M. Blei,et al.  Stochastic Structured Variational Inference , 2014, AISTATS.

[20]  Anthony Lee,et al.  Stability of noisy Metropolis–Hastings , 2015, Stat. Comput..

[21]  Gilles Pagès,et al.  Multilevel Richardson-Romberg Extrapolation , 2014 .

[22]  C. Andrieu,et al.  The pseudo-marginal approach for efficient Monte Carlo computations , 2009, 0903.5480.

[23]  Pierre Alquier,et al.  Noisy Monte Carlo: convergence of Markov chains with approximate transition kernels , 2014, Statistics and Computing.

[24]  M. Beaumont Estimation of population growth or decline in genetically monitored populations. , 2003, Genetics.

[25]  K. Chaloner,et al.  Bayesian Experimental Design: A Review , 1995 .

[26]  Denis Belomestny,et al.  Regression Methods for Stochastic Control Problems and Their Convergence Analysis , 2009, SIAM J. Control. Optim..

[27]  Christian P. Robert,et al.  Monte Carlo Methods , 2016 .

[28]  Noah D. Goodman,et al.  Practical optimal experiment design with probabilistic programs , 2016, ArXiv.

[29]  J. Møller,et al.  An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants , 2006 .

[30]  D. Rudolf,et al.  Perturbation theory for Markov chains via Wasserstein distance , 2015, Bernoulli.

[31]  Mark Broadie,et al.  Efficient Risk Estimation via Nested Sequential Simulation , 2011, Manag. Sci..

[32]  Gersende Fort,et al.  MCMC design-based non-parametric regression for rare event. Application to nested risk computations , 2017, Monte Carlo Methods Appl..

[33]  Gilles Pagès,et al.  Multi-step Richardson-Romberg Extrapolation: Remarks on Variance Control and Complexity , 2006, Monte Carlo Methods Appl..

[34]  C. Andrieu,et al.  Convergence properties of pseudo-marginal Markov chain Monte Carlo algorithms , 2012, 1210.1484.

[35]  Jay I. Myung,et al.  A Tutorial on Adaptive Design Optimization. , 2013, Journal of mathematical psychology.

[36]  Michael B. Giles,et al.  Multilevel Monte Carlo Path Simulation , 2008, Oper. Res..

[37]  Frank D. Wood,et al.  A New Approach to Probabilistic Programming Inference , 2014, AISTATS.

[38]  Benjamin T Vincent,et al.  Hierarchical Bayesian estimation and hypothesis testing for delay discounting tasks , 2015, Behavior Research Methods.

[39]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[40]  Frank Wood,et al.  Nested Compiled Inference for Hierarchical Reinforcement Learning , 2016 .

[41]  Ole Winther,et al.  Auxiliary Deep Generative Models , 2016, ICML.

[42]  Tuan Anh Le,et al.  Auto-Encoding Sequential Monte Carlo , 2017, ICLR.

[43]  Faming Liang,et al.  A double Metropolis–Hastings sampler for spatial models with intractable normalizing constants , 2010 .

[44]  P. Jacob,et al.  On nonnegative unbiased estimators , 2013, 1309.6473.

[45]  Nando de Freitas,et al.  An Introduction to Sequential Monte Carlo Methods , 2001, Sequential Monte Carlo Methods in Practice.

[46]  Yee Whye Teh,et al.  Filtering Variational Objectives , 2017, NIPS.

[47]  Stefan Heinrich,et al.  Multilevel Monte Carlo Methods , 2001, LSSC.

[48]  G. Pagès,et al.  Multilevel Richardson-Romberg Extrapolation , 2014, 1401.1177.

[49]  W. Gilks Markov Chain Monte Carlo , 2005 .

[50]  Yves F. Atchad'e,et al.  On Russian Roulette Estimates for Bayesian Inference with Doubly-Intractable Likelihoods , 2013, 1306.4032.

[51]  R. Durrett Probability: Theory and Examples , 1993 .

[52]  Noah D. Goodman,et al.  Reasoning about reasoning by nested conditioning: Modeling theory of mind with probabilistic programs , 2014, Cognitive Systems Research.

[53]  A. Doucet,et al.  Particle Markov chain Monte Carlo methods , 2010 .

[54]  L. Jeff Hong,et al.  Estimating the mean of a non-linear function of conditional expectation , 2009, Proceedings of the 2009 Winter Simulation Conference (WSC).

[55]  N. Metropolis,et al.  The Monte Carlo method. , 1949 .

[56]  Tom Rainforth,et al.  The DARC Toolbox: automated, flexible, and efficient delayed and risky choice experiments using Bayesian adaptive design , 2017 .

[57]  Frank D. Wood,et al.  Bayesian Optimization for Probabilistic Programs , 2017, NIPS.

[58]  Fredrik Lindsten,et al.  Nested Sequential Monte Carlo Methods , 2015, ICML.

[59]  Michael A. Osborne,et al.  Probabilistic Integration: A Role for Statisticians in Numerical Analysis? , 2015 .