论文信息 - Pure Exploration in Infinitely-Armed Bandit Models with Fixed-Confidence - 字舞流文

Pure Exploration in Infinitely-Armed Bandit Models with Fixed-Confidence

We consider the problem of near-optimal arm identification in the fixed confidence setting of the infinitely armed bandit problem when nothing is known about the arm reservoir distribution. We (1) introduce a PAC-like framework within which to derive and cast results; (2) derive a sample complexity lower bound for near-optimal arm identification; (3) propose an algorithm that identifies a nearly-optimal arm with high probability and derive an upper bound on its sample complexity which is within a log factor of our lower bound; and (4) discuss whether our log^2(1/delta) dependence is inescapable for "two-phase" (select arms first, identify the best later) algorithms in the infinite setting. This work permits the application of bandit models to a broader class of problems where fewer assumptions hold.

Maryam Aziz | Javed A. Aslam | Emilie Kaufmann | Jesse Anderton | E. Kaufmann | J. Aslam | Maryam Aziz | J. Anderton

[1] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .

[2] Aurélien Garivier,et al. Learning the distribution with largest mean: two bandit frameworks , 2017, ArXiv.

[3] John N. Tsitsiklis,et al. The Sample Complexity of Exploration in the Multi-Armed Bandit Problem , 2004, J. Mach. Learn. Res..

[4] Aurélien Garivier,et al. Optimal Best Arm Identification with Fixed Confidence , 2016, COLT.

[5] Aurélien Garivier,et al. On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models , 2014, J. Mach. Learn. Res..

[6] R. Munos,et al. Best Arm Identification in Multi-Armed Bandits , 2010, COLT.

[7] Shivaram Kalyanakrishnan,et al. PAC Identification of a Bandit Arm Relative to a Reward Quantile , 2017, AAAI.

[8] Oren Somekh,et al. Almost Optimal Exploration in Multi-Armed Bandits , 2013, ICML.

[9] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[10] Yoav Freund,et al. Experiments with a New Boosting Algorithm , 1996, ICML.

[11] Pietro Perona,et al. Quickly Boosting Decision Trees - Pruning Underachieving Features Early , 2013, ICML.

[12] B. Kégl,et al. Fast boosting using adversarial bandits , 2010, ICML.

[13] Ambuj Tewari,et al. PAC Subset Selection in Stochastic Multi-armed Bandits , 2012, ICML.

[14] François Fleuret,et al. Adaptive sampling for large scale boosting , 2014, J. Mach. Learn. Res..

[15] Alexandre Proutière,et al. Lipschitz Bandits: Regret Lower Bound and Optimal Algorithms , 2014, COLT.

[16] Shie Mannor,et al. Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems , 2006, J. Mach. Learn. Res..

[17] Benjamin Recht,et al. The Power of Adaptivity in Identifying Statistical Alternatives , 2016, NIPS.

[18] Alexandre Proutière,et al. Two-Target Algorithms for Infinite-Armed Bandits with Bernoulli Rewards , 2013, NIPS.

[19] Lluís Màrquez i Villodre,et al. Using LazyBoosting for Word Sense Disambiguation , 2001, *SEMEVAL.

[20] Michal Valko,et al. Simple regret for infinitely many armed bandits , 2015, ICML.

[21] Richard M. Karp,et al. Finding a most biased coin with fewest flips , 2012, COLT.

[22] Nahum Shimkin,et al. Infinitely Many-Armed Bandits with Unknown Value Distribution , 2014, ECML/PKDD.

[23] Rémi Munos,et al. Pure exploration in finitely-armed and continuous-armed bandits , 2011, Theor. Comput. Sci..

[24] Shivaram Kalyanakrishnan,et al. Information Complexity in Bandit Subset Selection , 2013, COLT.

[25] Alessandro Lazaric,et al. Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence , 2012, NIPS.

[26] Rémi Munos,et al. Algorithms for Infinitely Many-Armed Bandits , 2008, NIPS.

[27] A. Burnetas,et al. Optimal Adaptive Policies for Sequential Allocation Problems , 1996 .

[28] Rémi Munos,et al. Black-box optimization of noisy functions with unknown smoothness , 2015, NIPS.

[29] Csaba Szepesvári,et al. –armed Bandits , 2022 .

[30] Robert W. Chen,et al. Bandit problems with infinitely many arms , 1997 .