Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence

We study the problem of identifying the best arm(s) in the stochastic multi-armed bandit setting. This problem has been studied in the literature from two different perspectives: fixed budget and fixed confidence. We propose a unifying approach that leads to a meta-algorithm called unified gap-based exploration (UGapE), with a common structure and similar theoretical analysis for these two settings. We prove a performance bound for the two versions of the algorithm showing that the two problems are characterized by the same notion of complexity. We also show how the UGapE algorithm as well as its theoretical analysis can be extended to take into account the variance of the arms and to multiple bandits. Finally, we evaluate the performance of UGapE and compare it with a number of existing fixed budget and fixed confidence algorithms.

[1]  Andrew W. Moore,et al.  Hoeffding Races: Accelerating Model Selection Search for Classification and Function Approximation , 1993, NIPS.

[2]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[3]  Shie Mannor,et al.  Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems , 2006, J. Mach. Learn. Res..

[4]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[5]  Csaba Szepesvári,et al.  Empirical Bernstein stopping , 2008, ICML '08.

[6]  Rémi Munos,et al.  Pure Exploration for Multi-Armed Bandit Problems , 2008, ArXiv.

[7]  Massimiliano Pontil,et al.  Empirical Bernstein Bounds and Sample-Variance Penalization , 2009, COLT.

[8]  Rémi Munos,et al.  Pure Exploration in Multi-armed Bandits Problems , 2009, ALT.

[9]  Peter Stone,et al.  Efficient Selection of Multiple Bandit Arms: Theory and Practice , 2010, ICML.

[10]  Dominik D. Freydenberger,et al.  Can We Learn to Gamble Efficiently? , 2010, COLT.

[11]  Shivaram Kalyanakrishnan Learning Methods for Sequential Decision Making with Imperfect Representations by Shivaram Kalyanakrishnan , 2011 .

[12]  Alessandro Lazaric,et al.  Multi-Bandit Best Arm Identification , 2011, NIPS.

[13]  Joelle Pineau,et al.  Active Learning for Developing Personalized Treatment , 2011, UAI.

[14]  Ambuj Tewari,et al.  PAC Subset Selection in Stochastic Multi-armed Bandits , 2012, ICML.

[15]  Sébastien Bubeck,et al.  Multiple Identifications in Multi-Armed Bandits , 2012, ICML.