Adaptive strategy for stratified Monte Carlo sampling

We consider the problem of stratified sampling for Monte Carlo integration of a random variable. We model this problem in a K-armed bandit, where the arms represent the K strata. The goal is to estimate the integral mean, that is a weighted average of the mean values of the arms. The learner is allowed to sample the variable n times, but it can decide on-line which stratum to sample next. We propose an UCB-type strategy that samples the arms according to an upper bound on their estimated standard deviations. We compare its performance to an ideal sample allocation that knows the standard deviations of the arms. For sub-Gaussian arm distributions, we provide bounds on the total regret: a distribution-dependent bound of order poly(λmin-1)O(n-3/2)1 that depends on a measure of the disparity λmin of the per stratum variances and a distribution-free bound poly(K)O(n-7/6) that does not. We give similar, but somewhat sharper bounds on a proxy of the regret. The problem-independent bound for this proxy matches its recent minimax lower bound in terms of n up to a log n factor.

[1]  V. V. Buldygin,et al.  Sub-Gaussian random variables , 1980 .

[2]  Reuven Y. Rubinstein,et al.  Simulation and the Monte Carlo method , 1981, Wiley series in probability and mathematical statistics.

[3]  Dirk P. Kroese,et al.  Simulation and the Monte Carlo Method (Wiley Series in Probability and Statistics) , 1981 .

[4]  A. Antos Performance limits of nonparametric estimators , 1999 .

[5]  P. Glasserman,et al.  Asymptotically Optimal Importance Sampling and Stratification for Pricing Path‐Dependent Options , 1999 .

[6]  Paul Glasserman,et al.  Monte Carlo Methods in Financial Engineering , 2003 .

[7]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[8]  Bouhari Arouna,et al.  Adaptative Monte Carlo Method, A Variance Reduction Technique , 2004, Monte Carlo Methods Appl..

[9]  K. Athreya,et al.  Measure Theory and Probability Theory , 2006 .

[10]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[11]  K. Athreya,et al.  Measure Theory and Probability Theory (Springer Texts in Statistics) , 2006 .

[12]  P. Etoré,et al.  Adaptive Optimal Allocation in Stratified Sampling Methods , 2007, 0711.4514.

[13]  Jean-Yves Audibert,et al.  Minimax Policies for Adversarial and Stochastic Bandits. , 2009, COLT 2009.

[14]  Massimiliano Pontil,et al.  Empirical Bernstein Bounds and Sample-Variance Penalization , 2009, COLT.

[15]  Csaba Szepesvári,et al.  Exploration-exploitation tradeoff using variance estimates in multi-armed bandits , 2009, Theor. Comput. Sci..

[16]  Reiichiro Kawai,et al.  Asymptotically optimal allocation of stratified sampling with adaptive variance reduction by strata , 2010, TOMC.

[17]  Dominik D. Freydenberger,et al.  Can We Learn to Gamble Efficiently? , 2010, COLT.

[18]  Varun Grover,et al.  Active learning in heteroscedastic noise , 2010, Theor. Comput. Sci..

[19]  Rémi Munos,et al.  Pure exploration in finitely-armed and continuous-armed bandits , 2011, Theor. Comput. Sci..

[20]  Rémi Munos,et al.  Finite Time Analysis of Stratified Sampling for Monte Carlo , 2011, NIPS.

[21]  Alessandro Lazaric,et al.  Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits , 2011, ALT.

[22]  Gersende Fort,et al.  On adaptive stratification , 2011, Ann. Oper. Res..

[23]  Rémi Munos,et al.  Minimax Number of Strata for Online Stratified Sampling Given Noisy Samples , 2012, ALT.

[24]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[25]  Adaptive strategy for stratified Monte Carlo sampling , 2015 .

[26]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .