Bayesian Anytime m-top Exploration

We introduce Boundary Focused Thompson sampling (BFTS), a new Bayesian algorithm to solve the anytime m-top exploration problem, where the objective is to identify the m best arms in a multi-armed bandit. First, we consider a set of existing benchmark problems that consider sub-Gaussian reward distributions (i.e., Gaussian with fixed variance and categorical reward). Next, we introduce a new environment inspired by a real world decision problem concerning insect control for organic agriculture. This new environment encodes a Poisson rewards distribution. For all these benchmarks, we experimentally show that BFTS consistently outperforms AT-LUCB, the current state of the art algorithm.

[1]  Lihong Li,et al.  An Empirical Evaluation of Thompson Sampling , 2011, NIPS.

[2]  Christian P. Robert,et al.  The Bayesian choice : from decision-theoretic foundations to computational implementation , 2007 .

[3]  Lalit Jain,et al.  NEXT: A System for Real-World Development, Evaluation, and Application of Active Learning , 2015, NIPS.

[4]  R. Munos,et al.  Kullback–Leibler upper confidence bounds for optimal sequential allocation , 2012, 1210.1136.

[5]  Robert D. Nowak,et al.  Anytime Exploration for Multi-armed Bandits using Confidence Information , 2016, ICML.

[6]  Diego Klabjan,et al.  Improving the Expected Improvement Algorithm , 2017, NIPS.

[7]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[8]  Robert E. Bechhofer,et al.  A Sequential Multiple-Decision Procedure for Selecting the Best One of Several Normal Populations with a Common Unknown Variance, and Its Use with Various Experimental Designs , 1958 .

[9]  Ambuj Tewari,et al.  PAC Subset Selection in Stochastic Multi-armed Bandits , 2012, ICML.

[10]  Alessandro Lazaric,et al.  Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence , 2012, NIPS.

[11]  Rémi Munos,et al.  Pure Exploration in Multi-armed Bandits Problems , 2009, ALT.

[12]  Nando de Freitas,et al.  On correlation and budget constraints in model-based bandit optimization with application to automatic machine learning , 2014, AISTATS.

[13]  Stefano Ermon,et al.  Best arm identification in multi-armed bandits with delayed feedback , 2018, AISTATS.

[14]  Sébastien Bubeck,et al.  Multiple Identifications in Multi-Armed Bandits , 2012, ICML.

[15]  Ann Nowé,et al.  Bayesian Best-Arm Identification for Selecting Influenza Mitigation Strategies , 2017, ECML/PKDD.

[16]  Frank Tuyl A Note on Priors for the Multinomial Model , 2017 .

[17]  David J. Lunn,et al.  The BUGS Book: A Practical Introduction to Bayesian Analysis , 2013 .

[18]  Shipra Agrawal,et al.  Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.

[19]  Shie Mannor,et al.  PAC Bounds for Multi-armed Bandit and Markov Decision Processes , 2002, COLT.

[20]  Joseph Charles Mellor,et al.  Decision making using Thompson Sampling , 2014 .

[21]  Matthew Malloy,et al.  lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits , 2013, COLT.

[22]  Shivaram Kalyanakrishnan,et al.  Information Complexity in Bandit Subset Selection , 2013, COLT.

[23]  Oren Somekh,et al.  Almost Optimal Exploration in Multi-Armed Bandits , 2013, ICML.

[24]  Daniel Russo,et al.  Simple Bayesian Algorithms for Best Arm Identification , 2016, COLT.

[25]  R. L. Soulsby,et al.  Insect population curves: modelling and application to butterfly transect data , 2012 .

[26]  Peter Vrancx,et al.  Efficient Evaluation of Influenza Mitigation Strategies Using Preventive Bandits , 2017, AAMAS Workshops.

[27]  E. Paulson A Sequential Procedure for Selecting the Population with the Largest Mean from $k$ Normal Populations , 1964 .