论文信息 - Nearly Optimal Sampling Algorithms for Combinatorial Pure Exploration - 字舞流文

Nearly Optimal Sampling Algorithms for Combinatorial Pure Exploration

We study the combinatorial pure exploration problem Best-Set in stochastic multi-armed bandits. In a Best-Set instance, we are given $n$ arms with unknown reward distributions, as well as a family $\mathcal{F}$ of feasible subsets over the arms. Our goal is to identify the feasible subset in $\mathcal{F}$ with the maximum total mean using as few samples as possible. The problem generalizes the classical best arm identification problem and the top-$k$ arm identification problem, both of which have attracted significant attention in recent years. We provide a novel instance-wise lower bound for the sample complexity of the problem, as well as a nontrivial sampling algorithm, matching the lower bound up to a factor of $\ln|\mathcal{F}|$. For an important class of combinatorial families, we also provide polynomial time implementation of the sampling algorithm, using the equivalence of separation and optimization for convex program, and approximate Pareto curves in multi-objective optimization. We also show that the $\ln|\mathcal{F}|$ factor is inevitable in general through a nontrivial lower bound construction. Our results significantly improve several previous results for several important combinatorial constraints, and provide a tighter understanding of the general Best-Set problem. We further introduce an even more general problem, formulated in geometric terms. We are given $n$ Gaussian arms with unknown means and unit variance. Consider the $n$-dimensional Euclidean space $\mathbb{R}^n$, and a collection $\mathcal{O}$ of disjoint subsets. Our goal is to determine the subset in $\mathcal{O}$ that contains the $n$-dimensional vector of the means. The problem generalizes most pure exploration bandit problems studied in the literature. We provide the first nearly optimal sample complexity upper and lower bounds for the problem.

Ruosong Wang | Jian Li | Anupam Gupta | Lijie Chen | Mingda Qiao | Anupam Gupta | Mingda Qiao | J. Li | Lijie Chen | Ruosong Wang

[1] Venugopal V. Veeravalli,et al. Multihypothesis sequential probability ratio tests - Part I: Asymptotic optimality , 1999, IEEE Trans. Inf. Theory.

[2] Tara Javidi,et al. Active Sequential Hypothesis Testing , 2012, ArXiv.

[3] R. Munos,et al. Best Arm Identification in Multi-Armed Bandits , 2010, COLT.

[4] Jian Li,et al. On the Optimal Sample Complexity for Best Arm Identification , 2015, ArXiv.

[5] Robert D. Carr,et al. Strengthening integrality gaps for capacitated network design and covering problems , 2000, SODA '00.

[6] Aurélien Garivier,et al. Optimal Best Arm Identification with Fixed Confidence , 2016, COLT.

[7] Jian Li,et al. Nearly Instance Optimal Sample Complexity Bounds for Top-k Arm Selection , 2017, AISTATS.

[8] Aurélien Garivier,et al. On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models , 2014, J. Mach. Learn. Res..

[9] Ambuj Tewari,et al. PAC Subset Selection in Stochastic Multi-armed Bandits , 2012, ICML.

[10] Alexandra Carpentier,et al. An optimal algorithm for the Thresholding Bandit Problem , 2016, ICML.

[11] Wouter M. Koolen,et al. Maximin Action Identification: A New Bandit Framework for Games , 2016, COLT.

[12] Peter L. Bartlett,et al. Improved Learning Complexity in Combinatorial Pure Exploration Bandits , 2016, AISTATS.

[13] Mihalis Yannakakis,et al. On the approximability of trade-offs and optimal access of Web sources , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[14] P. Massart,et al. Adaptive estimation of a quadratic functional by model selection , 2000 .

[15] R. H. Farrell. Asymptotic Behavior of Expected Sample Size in Certain One Sided Tests , 1964 .

[16] Jian Li,et al. Towards Instance Optimal Bounds for Best Arm Identification , 2016, COLT.

[17] Matthew Malloy,et al. lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits , 2013, COLT.

[18] Jian Li,et al. Open Problem: Best Arm Identification: Almost Instance-Wise Optimality and the Gap Entropy Conjecture , 2016, COLT.

[19] Zohar S. Karnin. Verification Based Solution for Structured MAB Problems , 2016, NIPS.

[20] Alexander Schrijver,et al. Combinatorial optimization. Polyhedra and efficiency. , 2003 .

[21] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[22] Jian Li,et al. Pure Exploration of Multi-armed Bandit Under Matroid Constraints , 2016, COLT.

[23] H. Chernoff. Sequential Design of Experiments , 1959 .

[24] Noam Nisan,et al. Hardness vs. randomness , 1988, [Proceedings 1988] 29th Annual Symposium on Foundations of Computer Science.

[25] Shie Mannor,et al. PAC Bounds for Multi-armed Bandit and Markov Decision Processes , 2002, COLT.

[26] Alexandra Carpentier,et al. Tight (Lower) Bounds for the Fixed Budget Best Arm Identification Bandit Problem , 2016, COLT.

[27] Peter Stone,et al. Efficient Selection of Multiple Bandit Arms: Theory and Practice , 2010, ICML.

[28] Rémi Munos,et al. From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning , 2014, Found. Trends Mach. Learn..

[29] Wei Chen,et al. Combinatorial Pure Exploration of Multi-Armed Bandits , 2014, NIPS.

[30] Alessandro Lazaric,et al. Multi-Bandit Best Arm Identification , 2011, NIPS.

[31] Shie Mannor,et al. Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems , 2006, J. Mach. Learn. Res..

[32] Oren Somekh,et al. Almost Optimal Exploration in Multi-Armed Bandits , 2013, ICML.

[33] John N. Tsitsiklis,et al. The Sample Complexity of Exploration in the Multi-Armed Bandit Problem , 2004, J. Mach. Learn. Res..

[34] Xi Chen,et al. Optimal PAC Multiple Arm Identification with Applications to Crowdsourcing , 2014, ICML.

[35] Sébastien Bubeck,et al. Multiple Identifications in Multi-Armed Bandits , 2012, ICML.

[36] B. K. Ghosh,et al. Sequential Tests of Statistical Hypotheses. , 1972 .

[37] Wei Cao,et al. On Top-k Selection in Multi-Armed Bandits and Hidden Bipartite Graphs , 2015, NIPS.

[38] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .

[39] Shivaram Kalyanakrishnan,et al. Information Complexity in Bandit Subset Selection , 2013, COLT.

[40] Alessandro Lazaric,et al. Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence , 2012, NIPS.