Pure Exploration for Max-Quantile Bandits

We consider a variant of the pure exploration problem in Multi-Armed Bandits, where the goal is to find the arm for which the $$\lambda $$-quantile is maximal. Within the PAC framework, we provide a lower bound on the sample complexity of any $$\epsilon ,\delta $$-correct algorithm, and propose algorithms with matching upper bounds. Our bounds sharpen existing ones by explicitly incorporating the quantile factor $$\lambda $$. We further provide experiments that compare the sample complexity of our algorithms with that of previous works.

[1]  Ambuj Tewari,et al.  PAC Subset Selection in Stochastic Multi-armed Bandits , 2012, ICML.

[2]  Eyke Hüllermeier,et al.  Qualitative Multi-Armed Bandits: A Quantile-Based Approach , 2015, ICML.

[3]  Stephen F. Smith,et al.  An Asymptotically Optimal Algorithm for the Max k-Armed Bandit Problem , 2006, AAAI.

[4]  Shivaram Kalyanakrishnan,et al.  Information Complexity in Bandit Subset Selection , 2013, COLT.

[5]  Aurélien Garivier,et al.  On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models , 2014, J. Mach. Learn. Res..

[6]  Alessandro Lazaric,et al.  Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence , 2012, NIPS.

[7]  Nahum Shimkin,et al.  PAC Lower Bounds and Efficient Algorithms for The Max \(K\)-Armed Bandit Problem , 2016, ICML.

[8]  Shie Mannor,et al.  PAC Bounds for Multi-armed Bandit and Markov Decision Processes , 2002, COLT.

[9]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[10]  R. Munos,et al.  Best Arm Identification in Multi-Armed Bandits , 2010, COLT.

[11]  Oren Somekh,et al.  Almost Optimal Exploration in Multi-Armed Bandits , 2013, ICML.

[12]  Stephen F. Smith,et al.  The Max K-Armed Bandit: A New Model of Exploration Applied to Search Heuristic Selection , 2005, AAAI.

[13]  Jia Yuan Yu,et al.  Sample Complexity of Risk-Averse Bandit-Arm Selection , 2013, IJCAI.

[14]  John N. Tsitsiklis,et al.  The Sample Complexity of Exploration in the Multi-Armed Bandit Problem , 2004, J. Mach. Learn. Res..