Combinatorial Pure Exploration of Multi-Armed Bandits

We study the combinatorial pure exploration (CPE) problem in the stochastic multi-armed bandit setting, where a learner explores a set of arms with the objective of identifying the optimal member of a decision class, which is a collection of subsets of arms with certain combinatorial structures such as size-K subsets, matchings, spanning trees or paths, etc. The CPE problem represents a rich class of pure exploration tasks which covers not only many existing models but also novel cases where the object of interest has a non-trivial combinatorial structure. In this paper, we provide a series of results for the general CPE problem. We present general learning algorithms which work for all decision classes that admit offline maximization oracles in both fixed confidence and fixed budget settings. We prove problem-dependent upper bounds of our algorithms. Our analysis exploits the combinatorial structures of the decision classes and introduces a new analytic tool. We also establish a general problem-dependent lower bound for the CPE problem. Our results show that the proposed algorithms achieve the optimal sample complexity (within logarithmic factors) for many decision classes. In addition, applying our results back to the problems of top-K arms identification and multiple bandit best arms identification, we recover the best available upper bounds up to constant factors and partially resolve a conjecture on the lower bounds.

[1]  Shie Mannor,et al.  Thompson Sampling for Complex Online Problems , 2013, ICML.

[2]  Wei Chen,et al.  Combinatorial Partial Monitoring Game with Linear Feedback and Its Applications , 2014, ICML.

[3]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[4]  R. Munos,et al.  Best Arm Identification in Multi-Armed Bandits , 2010, COLT.

[5]  Sébastien Bubeck,et al.  Multiple Identifications in Multi-Armed Bandits , 2012, ICML.

[6]  Robert D. Nowak,et al.  Best-arm identification algorithms for multi-armed bandits in the fixed confidence setting , 2014, 2014 48th Annual Conference on Information Sciences and Systems (CISS).

[7]  Sham M. Kakade,et al.  Towards Minimax Policies for Online Linear Optimization with Bandit Feedback , 2012, COLT.

[8]  Omar Rivasplata,et al.  Subgaussian random variables : An expository note , 2012 .

[9]  Wei Chen,et al.  Combinatorial Multi-Armed Bandit: General Framework and Applications , 2013, ICML.

[10]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[11]  Ambuj Tewari,et al.  PAC Subset Selection in Stochastic Multi-armed Bandits , 2012, ICML.

[12]  Ratul Mahajan,et al.  Measuring ISP topologies with Rocketfuel , 2004, IEEE/ACM Transactions on Networking.

[13]  C Berge,et al.  TWO THEOREMS IN GRAPH THEORY. , 1957, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Sheldon M. Ross,et al.  Stochastic Processes , 2018, Gauge Integral Structures for Stochastic Calculus and Quantum Electrodynamics.

[15]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[16]  Zheng Wen,et al.  Matroid Bandits: Fast Combinatorial Optimization with Learning , 2014, UAI.

[17]  J. Oxley Matroid Theory (Oxford Graduate Texts in Mathematics) , 2006 .

[18]  Nicolò Cesa-Bianchi,et al.  Combinatorial Bandits , 2012, COLT.

[19]  Robert E. Schapire,et al.  Non-Stochastic Bandit Slate Problems , 2010, NIPS.

[20]  Matthew Malloy,et al.  lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits , 2013, COLT.

[21]  Rémi Munos,et al.  Pure exploration in finitely-armed and continuous-armed bandits , 2011, Theor. Comput. Sci..

[22]  Alessandro Lazaric,et al.  Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence , 2012, NIPS.

[23]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .

[24]  Alessandro Lazaric,et al.  Multi-Bandit Best Arm Identification , 2011, NIPS.

[25]  Shivaram Kalyanakrishnan,et al.  Information Complexity in Bandit Subset Selection , 2013, COLT.

[26]  Shie Mannor,et al.  Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems , 2006, J. Mach. Learn. Res..

[27]  James G. Oxley,et al.  Matroid theory , 1992 .

[28]  Peter Stone,et al.  Efficient Selection of Multiple Bandit Arms: Theory and Practice , 2010, ICML.

[29]  John N. Tsitsiklis,et al.  The Sample Complexity of Exploration in the Multi-Armed Bandit Problem , 2004, J. Mach. Learn. Res..

[30]  Xi Chen,et al.  Optimal PAC Multiple Arm Identification with Applications to Crowdsourcing , 2014, ICML.