Top Arm Identification in Multi-Armed Bandits with Batch Arm Pulls

We introduce a new multi-armed bandit (MAB) problem in which arms must be sampled in batches, rather than one at a time. This is motivated by applications in social media monitoring and biological experimentation where such batch constraints naturally arise. This paper develops and analyzes algorithms for batch MABs and top arm identification, for both fixed confidence and fixed budget settings. Our main theoretical results show that the batch constraint does not significantly affect the sample complexity of top arm identification compared to unconstrained MAB algorithms. Alternatively, if one views a batch as the fundamental sampling unit, then the results can be interpreted as showing that the sample complexity of batch MABs can be significantly less than traditional MABs. We demonstrate the new batch MAB algorithms with simulations and in two interesting real-world applications: (i) microwell array experiments for identifying genes that are important in virus replication and (ii) finding the most active users in Twitter on a specific topic.

[1]  Christian Igel,et al.  Hoeffding and Bernstein races for selecting policies in evolutionary direct policy search , 2009, ICML '09.

[2]  András György,et al.  Online Learning under Delayed Feedback , 2013, ICML.

[3]  A. Pellegrini,et al.  A longitudinal study of bullying, dominance, and victimization during the transition from primary school through secondary school , 2002 .

[4]  Sébastien Bubeck,et al.  Multiple Identifications in Multi-Armed Bandits , 2012, ICML.

[5]  Shie Mannor,et al.  PAC Bounds for Multi-armed Bandit and Markov Decision Processes , 2002, COLT.

[6]  Oren Somekh,et al.  Almost Optimal Exploration in Multi-Armed Bandits , 2013, ICML.

[7]  Robert D. Nowak,et al.  Best-arm identification algorithms for multi-armed bandits in the fixed confidence setting , 2014, 2014 48th Annual Conference on Information Sciences and Systems (CISS).

[8]  E. Paulson A Sequential Procedure for Selecting the Population with the Largest Mean from $k$ Normal Populations , 1964 .

[9]  Robert E. Bechhofer,et al.  A Sequential Multiple-Decision Procedure for Selecting the Best One of Several Normal Populations with a Common Unknown Variance, and Its Use with Various Experimental Designs , 1958 .

[10]  M. Newton,et al.  Drosophila RNAi screen identifies host genes important for influenza virus replication , 2008, Nature.

[11]  Peter Stone,et al.  Efficient Selection of Multiple Bandit Arms: Theory and Practice , 2010, ICML.

[12]  Yifan Wu,et al.  On Identifying Good Options under Combinatorially Structured Feedback in Finite Noisy Environments , 2015, ICML.

[13]  Erik Ordentlich,et al.  On delayed prediction of individual sequences , 2002, IEEE Trans. Inf. Theory.

[14]  Ambuj Tewari,et al.  PAC Subset Selection in Stochastic Multi-armed Bandits , 2012, ICML.

[15]  Vianney Perchet,et al.  Batched Bandit Problems , 2015, COLT.

[16]  Gábor Lugosi,et al.  Regret in Online Combinatorial Optimization , 2012, Math. Oper. Res..

[17]  R. Munos,et al.  Best Arm Identification in Multi-Armed Bandits , 2010, COLT.

[18]  E. Ordentlich,et al.  On delayed prediction of individual sequences , 2002, Proceedings IEEE International Symposium on Information Theory,.

[19]  Aurélien Garivier,et al.  On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models , 2014, J. Mach. Learn. Res..

[20]  Mark Craven,et al.  Limited Agreement of Independent RNAi Screens for Virus-Required Host Genes Owes More to False-Negative than False-Positive Factors , 2013, PLoS Comput. Biol..

[21]  Andrew W. Moore,et al.  Hoeffding Races: Accelerating Model Selection Search for Classification and Function Approximation , 1993, NIPS.

[22]  Alexandru Niculescu-Mizil Multi-Armed Bandits with Betting , 2009 .

[23]  Jun-Ming Xu,et al.  Learning from Bullying Traces in Social Media , 2012, NAACL.

[24]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[25]  Matthew Malloy,et al.  lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits , 2013, COLT.