Nonlinear sequential accepts and rejects for identification of top arms in stochastic bandits

We address the M-best-arm identification problem in multi-armed bandits. A player has a limited budget to explore K arms (M < K), and once pulled, each arm yields a reward drawn (independently) from a fixed, unknown distribution. The goal is to find the top M arms in the sense of expected reward. We develop an algorithm which proceeds in rounds to deactivate arms iteratively. At each round, the budget is divided by a nonlinear function of remaining arms, and the arms are pulled correspondingly. Based on a decision rule, the deactivated arm at each round may be accepted or rejected. The algorithm outputs the accepted arms that should ideally be the top M arms. We characterize the decay rate of the misidentification probability and establish that the nonlinear budget allocation proves to be useful for different problem environments (described by the number of competitive arms). We provide comprehensive numerical experiments showing that our algorithm outperforms the state-of-the-art using suitable nonlinearity.

[1]  Naumaan Nayyar,et al.  Decentralized Learning for Multiplayer Multiarmed Bandits , 2014, IEEE Transactions on Information Theory.

[2]  Qing Zhao,et al.  Distributed Learning in Multi-Armed Bandit With Multiple Players , 2009, IEEE Transactions on Signal Processing.

[3]  Rémi Munos,et al.  Pure exploration in finitely-armed and continuous-armed bandits , 2011, Theor. Comput. Sci..

[4]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .

[5]  Jian Li,et al.  Nearly Instance Optimal Sample Complexity Bounds for Top-k Arm Selection , 2017, AISTATS.

[6]  Vahid Tarokh,et al.  On Sequential Elimination Algorithms for Best-Arm Identification in Multi-Armed Bandits , 2016, IEEE Transactions on Signal Processing.

[7]  Peter Stone,et al.  Efficient Selection of Multiple Bandit Arms: Theory and Practice , 2010, ICML.

[8]  Ambuj Tewari,et al.  PAC Subset Selection in Stochastic Multi-armed Bandits , 2012, ICML.

[9]  Aurélien Garivier,et al.  On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models , 2014, J. Mach. Learn. Res..

[10]  Robert D. Nowak,et al.  Anytime Exploration for Multi-armed Bandits using Confidence Information , 2016, ICML.

[11]  Alessandro Lazaric,et al.  Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence , 2012, NIPS.

[12]  Shie Mannor,et al.  Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems , 2006, J. Mach. Learn. Res..

[13]  R. Munos,et al.  Best Arm Identification in Multi-Armed Bandits , 2010, COLT.

[14]  Sattar Vakili,et al.  Deterministic Sequencing of Exploration and Exploitation for Multi-Armed Bandit Problems , 2011, IEEE Journal of Selected Topics in Signal Processing.

[15]  Lalit Jain,et al.  NEXT: A System for Real-World Development, Evaluation, and Application of Active Learning , 2015, NIPS.

[16]  Oren Somekh,et al.  Almost Optimal Exploration in Multi-Armed Bandits , 2013, ICML.

[17]  Shivaram Kalyanakrishnan,et al.  Information Complexity in Bandit Subset Selection , 2013, COLT.

[18]  Demosthenis Teneketzis,et al.  Multi-Armed Bandit Problems , 2008 .

[19]  Rémi Munos,et al.  Pure Exploration in Multi-armed Bandits Problems , 2009, ALT.

[20]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[21]  Sébastien Bubeck,et al.  Multiple Identifications in Multi-Armed Bandits , 2012, ICML.

[22]  Lin Chen,et al.  On Optimality of Myopic Policy for Restless Multi-Armed Bandit Problem: An Axiomatic Approach , 2012, IEEE Transactions on Signal Processing.

[23]  Jian Li,et al.  Practical Algorithms for Best-K Identification in Multi-Armed Bandits , 2017, ArXiv.

[24]  Matthew Malloy,et al.  lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits , 2013, COLT.

[25]  John N. Tsitsiklis,et al.  The Sample Complexity of Exploration in the Multi-Armed Bandit Problem , 2004, J. Mach. Learn. Res..

[26]  Xi Chen,et al.  Optimal PAC Multiple Arm Identification with Applications to Crowdsourcing , 2014, ICML.

[27]  Xi Chen,et al.  Adaptive Multiple-Arm Identification , 2017, ICML.