论文信息 - Good arm identification via bandit feedback - 字舞流文

Good arm identification via bandit feedback

We consider a novel stochastic multi-armed bandit problem called good arm identification (GAI), where a good arm is defined as an arm with expected reward greater than or equal to a given threshold. GAI is a pure-exploration problem in which a single agent repeats a process of outputting an arm as soon as it is identified as a good one before confirming the other arms are actually not good. The objective of GAI is to minimize the number of samples for each process. We find that GAI faces a new kind of dilemma, the exploration-exploitation dilemma of confidence, which is different from the best arm identification. As a result, an efficient design of algorithms for GAI is quite different from that for the best arm identification. We derive a lower bound on the sample complexity of GAI that is tight up to the logarithmic factor $$\mathrm {O}(\log \frac{1}{\delta })$$O(log1δ) for acceptance error rate $$\delta $$δ. We also develop an algorithm whose sample complexity almost matches the lower bound. We also confirm experimentally that our proposed algorithm outperforms naive algorithms in synthetic settings based on a conventional bandit problem and clinical trial researches for rheumatoid arthritis.

Masashi Sugiyama | Junya Honda | Atsuyoshi Nakamura | Kentaro Sakamaki | Hideaki Kano | Kentaro Matsuura | Masashi Sugiyama | J. Honda | Atsuyoshi Nakamura | H. Kano | Kentaro Matsuura | Kentaro Sakamaki

[1] E. Keystone,et al. Determining the Minimally Important Difference in the Clinical Disease Activity Index for Improvement and Worsening in Early Rheumatoid Arthritis Patients , 2015, Arthritis care & research.

[2] Shie Mannor,et al. Action Elimination and Stopping Conditions for Reinforcement Learning , 2003, ICML.

[3] Robert D. Nowak,et al. Top Arm Identification in Multi-Armed Bandits with Batch Arm Pulls , 2016, AISTATS.

[4] Liang Tang,et al. Personalized Recommendation via Parameter-Free Contextual Bandits , 2015, SIGIR.

[5] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[6] Feng Liu,et al. Design considerations and analysis planning of a phase 2a proof of concept study in rheumatoid arthritis in the presence of possible non-monotonicity , 2017, BMC Medical Research Methodology.

[7] Shipra Agrawal,et al. Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.

[8] Xi Chen,et al. Optimal PAC Multiple Arm Identification with Applications to Crowdsourcing , 2014, ICML.

[9] Alexandra Carpentier,et al. An optimal algorithm for the Thresholding Bandit Problem , 2016, ICML.

[10] Patrick Durez,et al. Efficacy and safety of secukinumab in patients with rheumatoid arthritis: a phase II, dose-finding, double-blind, randomised, placebo controlled study , 2012, Annals of the rheumatic diseases.

[11] R. Munos,et al. Kullback–Leibler upper confidence bounds for optimal sequential allocation , 2012, 1210.1136.

[12] Jürgen Branke,et al. Integrating Techniques from Statistical Ranking into Evolutionary Algorithms , 2006, EvoWorkshops.

[13] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 1985 .

[14] Ambuj Tewari,et al. PAC Subset Selection in Stochastic Multi-armed Bandits , 2012, ICML.

[15] Andrew P Grieve,et al. ASTIN: a Bayesian adaptive dose–response trial in acute stroke , 2005, Clinical trials.

[16] Balaraman Ravindran,et al. Thresholding Bandits with Augmented UCB , 2017, IJCAI.

[17] A. Law,et al. A procedure for selecting a subset of size m containing the l best of k independent normal populations, with applications to simulation , 1985 .

[18] Aurélien Garivier,et al. On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models , 2014, J. Mach. Learn. Res..

[19] Matthew Malloy,et al. lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits , 2013, COLT.

[20] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[21] Edward S. Kim,et al. The BATTLE trial: personalizing therapy for lung cancer. , 2011, Cancer discovery.

[22] Stefano Zamuner,et al. Safety, tolerability, pharmacokinetics and pharmacodynamics of an anti- oncostatin M monoclonal antibody in rheumatoid arthritis: results from phase II randomized, placebo-controlled trials , 2013, Arthritis Research & Therapy.