Good arm identification via bandit feedback

We consider a novel stochastic multi-armed bandit problem called good arm identification (GAI), where a good arm is defined as an arm with expected reward greater than or equal to a given threshold. GAI is a pure-exploration problem in which a single agent repeats a process of outputting an arm as soon as it is identified as a good one before confirming the other arms are actually not good. The objective of GAI is to minimize the number of samples for each process. We find that GAI faces a new kind of dilemma, the exploration-exploitation dilemma of confidence, which is different from the best arm identification. As a result, an efficient design of algorithms for GAI is quite different from that for the best arm identification. We derive a lower bound on the sample complexity of GAI that is tight up to the logarithmic factor $$\mathrm {O}(\log \frac{1}{\delta })$$O(log1δ) for acceptance error rate $$\delta $$δ. We also develop an algorithm whose sample complexity almost matches the lower bound. We also confirm experimentally that our proposed algorithm outperforms naive algorithms in synthetic settings based on a conventional bandit problem and clinical trial researches for rheumatoid arthritis.

[1]  E. Keystone,et al.  Determining the Minimally Important Difference in the Clinical Disease Activity Index for Improvement and Worsening in Early Rheumatoid Arthritis Patients , 2015, Arthritis care & research.

[2]  Shie Mannor,et al.  Action Elimination and Stopping Conditions for Reinforcement Learning , 2003, ICML.

[3]  Robert D. Nowak,et al.  Top Arm Identification in Multi-Armed Bandits with Batch Arm Pulls , 2016, AISTATS.

[4]  Liang Tang,et al.  Personalized Recommendation via Parameter-Free Contextual Bandits , 2015, SIGIR.

[5]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[6]  Feng Liu,et al.  Design considerations and analysis planning of a phase 2a proof of concept study in rheumatoid arthritis in the presence of possible non-monotonicity , 2017, BMC Medical Research Methodology.

[7]  Shipra Agrawal,et al.  Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.

[8]  Xi Chen,et al.  Optimal PAC Multiple Arm Identification with Applications to Crowdsourcing , 2014, ICML.

[9]  Alexandra Carpentier,et al.  An optimal algorithm for the Thresholding Bandit Problem , 2016, ICML.

[10]  Patrick Durez,et al.  Efficacy and safety of secukinumab in patients with rheumatoid arthritis: a phase II, dose-finding, double-blind, randomised, placebo controlled study , 2012, Annals of the rheumatic diseases.

[11]  R. Munos,et al.  Kullback–Leibler upper confidence bounds for optimal sequential allocation , 2012, 1210.1136.

[12]  Jürgen Branke,et al.  Integrating Techniques from Statistical Ranking into Evolutionary Algorithms , 2006, EvoWorkshops.

[13]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 1985 .

[14]  Ambuj Tewari,et al.  PAC Subset Selection in Stochastic Multi-armed Bandits , 2012, ICML.

[15]  Andrew P Grieve,et al.  ASTIN: a Bayesian adaptive dose–response trial in acute stroke , 2005, Clinical trials.

[16]  Balaraman Ravindran,et al.  Thresholding Bandits with Augmented UCB , 2017, IJCAI.

[17]  A. Law,et al.  A procedure for selecting a subset of size m containing the l best of k independent normal populations, with applications to simulation , 1985 .

[18]  Aurélien Garivier,et al.  On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models , 2014, J. Mach. Learn. Res..

[19]  Matthew Malloy,et al.  lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits , 2013, COLT.

[20]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[21]  Edward S. Kim,et al.  The BATTLE trial: personalizing therapy for lung cancer. , 2011, Cancer discovery.

[22]  Stefano Zamuner,et al.  Safety, tolerability, pharmacokinetics and pharmacodynamics of an anti- oncostatin M monoclonal antibody in rheumatoid arthritis: results from phase II randomized, placebo-controlled trials , 2013, Arthritis Research & Therapy.