论文信息 - Pure Exploration for Max-Quantile Bandits

Pure Exploration for Max-Quantile Bandits

We consider a variant of the pure exploration problem in Multi-Armed Bandits, where the goal is to find the arm for which the $$\lambda $$-quantile is maximal. Within the PAC framework, we provide a lower bound on the sample complexity of any $$\epsilon ,\delta $$-correct algorithm, and propose algorithms with matching upper bounds. Our bounds sharpen existing ones by explicitly incorporating the quantile factor $$\lambda $$. We further provide experiments that compare the sample complexity of our algorithms with that of previous works.

Nahum Shimkin | Yahel David | N. Shimkin | Y. David

[1] Ambuj Tewari,et al. PAC Subset Selection in Stochastic Multi-armed Bandits , 2012, ICML.

[2] Eyke Hüllermeier,et al. Qualitative Multi-Armed Bandits: A Quantile-Based Approach , 2015, ICML.

[3] Stephen F. Smith,et al. An Asymptotically Optimal Algorithm for the Max k-Armed Bandit Problem , 2006, AAAI.

[4] Shivaram Kalyanakrishnan,et al. Information Complexity in Bandit Subset Selection , 2013, COLT.

[5] Aurélien Garivier,et al. On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models , 2014, J. Mach. Learn. Res..

[6] Alessandro Lazaric,et al. Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence , 2012, NIPS.

[7] Nahum Shimkin,et al. PAC Lower Bounds and Efficient Algorithms for The Max $K$-Armed Bandit Problem , 2016, ICML.

[8] Shie Mannor,et al. PAC Bounds for Multi-armed Bandit and Markov Decision Processes , 2002, COLT.

[9] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[10] R. Munos,et al. Best Arm Identification in Multi-Armed Bandits , 2010, COLT.

[11] Oren Somekh,et al. Almost Optimal Exploration in Multi-Armed Bandits , 2013, ICML.

[12] Stephen F. Smith,et al. The Max K-Armed Bandit: A New Model of Exploration Applied to Search Heuristic Selection , 2005, AAAI.

[13] Jia Yuan Yu,et al. Sample Complexity of Risk-Averse Bandit-Arm Selection , 2013, IJCAI.

[14] John N. Tsitsiklis,et al. The Sample Complexity of Exploration in the Multi-Armed Bandit Problem , 2004, J. Mach. Learn. Res..