Pure Exploration in Infinitely-Armed Bandit Models with Fixed-Confidence

We consider the problem of near-optimal arm identification in the fixed confidence setting of the infinitely armed bandit problem when nothing is known about the arm reservoir distribution. We (1) introduce a PAC-like framework within which to derive and cast results; (2) derive a sample complexity lower bound for near-optimal arm identification; (3) propose an algorithm that identifies a nearly-optimal arm with high probability and derive an upper bound on its sample complexity which is within a log factor of our lower bound; and (4) discuss whether our log^2(1/delta) dependence is inescapable for "two-phase" (select arms first, identify the best later) algorithms in the infinite setting. This work permits the application of bandit models to a broader class of problems where fewer assumptions hold.

[1]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .

[2]  Aurélien Garivier,et al.  Learning the distribution with largest mean: two bandit frameworks , 2017, ArXiv.

[3]  John N. Tsitsiklis,et al.  The Sample Complexity of Exploration in the Multi-Armed Bandit Problem , 2004, J. Mach. Learn. Res..

[4]  Aurélien Garivier,et al.  Optimal Best Arm Identification with Fixed Confidence , 2016, COLT.

[5]  Aurélien Garivier,et al.  On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models , 2014, J. Mach. Learn. Res..

[6]  R. Munos,et al.  Best Arm Identification in Multi-Armed Bandits , 2010, COLT.

[7]  Shivaram Kalyanakrishnan,et al.  PAC Identification of a Bandit Arm Relative to a Reward Quantile , 2017, AAAI.

[8]  Oren Somekh,et al.  Almost Optimal Exploration in Multi-Armed Bandits , 2013, ICML.

[9]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[10]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[11]  Pietro Perona,et al.  Quickly Boosting Decision Trees - Pruning Underachieving Features Early , 2013, ICML.

[12]  B. Kégl,et al.  Fast boosting using adversarial bandits , 2010, ICML.

[13]  Ambuj Tewari,et al.  PAC Subset Selection in Stochastic Multi-armed Bandits , 2012, ICML.

[14]  François Fleuret,et al.  Adaptive sampling for large scale boosting , 2014, J. Mach. Learn. Res..

[15]  Alexandre Proutière,et al.  Lipschitz Bandits: Regret Lower Bound and Optimal Algorithms , 2014, COLT.

[16]  Shie Mannor,et al.  Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems , 2006, J. Mach. Learn. Res..

[17]  Benjamin Recht,et al.  The Power of Adaptivity in Identifying Statistical Alternatives , 2016, NIPS.

[18]  Alexandre Proutière,et al.  Two-Target Algorithms for Infinite-Armed Bandits with Bernoulli Rewards , 2013, NIPS.

[19]  Lluís Màrquez i Villodre,et al.  Using LazyBoosting for Word Sense Disambiguation , 2001, *SEMEVAL.

[20]  Michal Valko,et al.  Simple regret for infinitely many armed bandits , 2015, ICML.

[21]  Richard M. Karp,et al.  Finding a most biased coin with fewest flips , 2012, COLT.

[22]  Nahum Shimkin,et al.  Infinitely Many-Armed Bandits with Unknown Value Distribution , 2014, ECML/PKDD.

[23]  Rémi Munos,et al.  Pure exploration in finitely-armed and continuous-armed bandits , 2011, Theor. Comput. Sci..

[24]  Shivaram Kalyanakrishnan,et al.  Information Complexity in Bandit Subset Selection , 2013, COLT.

[25]  Alessandro Lazaric,et al.  Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence , 2012, NIPS.

[26]  Rémi Munos,et al.  Algorithms for Infinitely Many-Armed Bandits , 2008, NIPS.

[27]  A. Burnetas,et al.  Optimal Adaptive Policies for Sequential Allocation Problems , 1996 .

[28]  Rémi Munos,et al.  Black-box optimization of noisy functions with unknown smoothness , 2015, NIPS.

[29]  Csaba Szepesvári,et al.  –armed Bandits , 2022 .

[30]  Robert W. Chen,et al.  Bandit problems with infinitely many arms , 1997 .