论文信息 - Tight (Lower) Bounds for the Fixed Budget Best Arm Identification Bandit Problem

Tight (Lower) Bounds for the Fixed Budget Best Arm Identification Bandit Problem

We consider the problem of \textit{best arm identification} with a \textit{fixed budget $T$}, in the $K$-armed stochastic bandit setting, with arms distribution defined on $[0,1]$. We prove that any bandit strategy, for at least one bandit problem characterized by a complexity $H$, will misidentify the best arm with probability lower bounded by $$\exp\Big(-\frac{T}{\log(K)H}\Big),$$ where $H$ is the sum for all sub-optimal arms of the inverse of the squared gaps. Our result disproves formally the general belief - coming from results in the fixed confidence setting - that there must exist an algorithm for this problem whose probability of error is upper bounded by $\exp(-T/H)$. This also proves that some existing strategies based on the Successive Rejection of the arms are optimal - closing therefore the current gap between upper and lower bounds for the fixed budget best arm identification problem.

Alexandra Carpentier | Andrea Locatelli | A. Carpentier | A. Locatelli

[1] Shie Mannor,et al. PAC Bounds for Multi-armed Bandit and Markov Decision Processes , 2002, COLT.

[2] V. Spokoiny,et al. Optimal pointwise adaptive methods in nonparametric estimation , 1997 .

[3] Ambuj Tewari,et al. PAC Subset Selection in Stochastic Multi-armed Bandits , 2012, ICML.

[4] Oren Somekh,et al. Almost Optimal Exploration in Multi-Armed Bandits , 2013, ICML.

[5] Wei Chen,et al. Combinatorial Pure Exploration of Multi-Armed Bandits , 2014, NIPS.

[6] Matthew Malloy,et al. On Finding the Largest Mean Among Many , 2013, ArXiv.

[7] John N. Tsitsiklis,et al. The Sample Complexity of Exploration in the Multi-Armed Bandit Problem , 2004, J. Mach. Learn. Res..

[8] Xi Chen,et al. Optimal PAC Multiple Arm Identification with Applications to Crowdsourcing , 2014, ICML.

[9] Matthew Malloy,et al. lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits , 2013, COLT.

[10] Aurélien Garivier,et al. On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models , 2014, J. Mach. Learn. Res..

[11] Rémi Munos,et al. Pure Exploration in Multi-armed Bandits Problems , 2009, ALT.

[12] R. Munos,et al. Best Arm Identification in Multi-Armed Bandits , 2010, COLT.

[13] Jian Li,et al. On the Optimal Sample Complexity for Best Arm Identification , 2015, ArXiv.

[14] Alessandro Lazaric,et al. Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence , 2012, NIPS.

[15] A. Tsybakov,et al. Sparsity oracle inequalities for the Lasso , 2007, 0705.3308.

[16] Sébastien Bubeck,et al. Multiple Identifications in Multi-Armed Bandits , 2012, ICML.

[17] Wei Cao,et al. On Top-k Selection in Multi-Armed Bandits and Hidden Bipartite Graphs , 2015, NIPS.

[18] Robert D. Nowak,et al. Best-arm identification algorithms for multi-armed bandits in the fixed confidence setting , 2014, 2014 48th Annual Conference on Information Sciences and Systems (CISS).