Polynomial-time Algorithms for Combinatorial Pure Exploration with Full-bandit Feedback

We study the problem of stochastic combinatorial pure exploration (CPE), where an agent sequentially pulls a set of single arms (a.k.a. a super arm) and tries to find the best super arm. Among a variety of problem settings of the CPE, we focus on the full-bandit setting, where we cannot observe the reward of each single arm, but only the sum of the rewards. Although we can regard the CPE with full-bandit feedback as a special case of pure exploration in linear bandits, an approach based on linear bandits is not computationally feasible since the number of super arms may be exponential. In this paper, we first propose a polynomial-time bandit algorithm for the CPE under general combinatorial constraints and provide an upper bound of the sample complexity. Second, we design an approximation algorithm for the 0-1 quadratic maximization problem, which arises in many bandit algorithms with confidence ellipsoids. Based on our approximation algorithm, we propose novel bandit algorithms for the top-k selection problem, and prove that our algorithms run in polynomial time. Finally, we conduct experiments on synthetic and real-world datasets, and confirm the validity of our theoretical analysis in terms of both the computation time and the sample complexity.

[1]  Matthew Malloy,et al.  lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits , 2013, COLT.

[2]  Shie Mannor,et al.  PAC Bounds for Multi-armed Bandit and Markov Decision Processes , 2002, COLT.

[3]  Shivaram Kalyanakrishnan,et al.  Information Complexity in Bandit Subset Selection , 2013, COLT.

[4]  Nicolò Cesa-Bianchi,et al.  Combinatorial Bandits , 2012, COLT.

[5]  H. Whitney On the Abstract Properties of Linear Dependence , 1935 .

[6]  Wei Cao,et al.  On Top-k Selection in Multi-Armed Bandits and Hidden Bipartite Graphs , 2015, NIPS.

[7]  Hisashi Kashima,et al.  Hyper Questions: Unsupervised Targeting of a Few Experts in Crowdsourcing , 2017, CIKM.

[8]  David R. Karger,et al.  Random sampling and greedy sparsification for matroid optimization problems , 1998, Math. Program..

[9]  Aditya Bhaskara,et al.  Detecting high log-densities: an O(n¼) approximation for densest k-subgraph , 2010, STOC '10.

[10]  Jian Li,et al.  On the Optimal Sample Complexity for Best Arm Identification , 2015, ArXiv.

[11]  Uriel Feige,et al.  The Dense k -Subgraph Problem , 2001, Algorithmica.

[12]  Jian Li,et al.  Pure Exploration of Multi-armed Bandit Under Matroid Constraints , 2016, COLT.

[13]  Wei Chen,et al.  Combinatorial Pure Exploration of Multi-Armed Bandits , 2014, NIPS.

[14]  Ambuj Tewari,et al.  PAC Subset Selection in Stochastic Multi-armed Bandits , 2012, ICML.

[15]  Richard Taylor,et al.  Approximation of the Quadratic Knapsack Problem , 2015, Oper. Res. Lett..

[16]  Peter L. Bartlett,et al.  Improved Learning Complexity in Combinatorial Pure Exploration Bandits , 2016, AISTATS.

[17]  Wei Chen,et al.  Combinatorial Pure Exploration with Continuous and Separable Reward Functions and Its Applications (Extended Version) , 2018, IJCAI.

[18]  Zhi Ding,et al.  Opportunistic spectrum access in cognitive radio networks , 2008, IJCNN.

[19]  François Le Gall,et al.  Powers of tensors and fast matrix multiplication , 2014, ISSAC.