论文信息 - Open Problem: Adversarial Multiarmed Bandits with Limited Advice

Open Problem: Adversarial Multiarmed Bandits with Limited Advice

Adversarial multiarmed bandits with expert advice is one of the fundamental problems in studying the exploration-exploitation trade-o. It is known that if we observe the advice of all experts on every round we can achieve O(√KTlnN) regret, where K is the number of arms, T is the number of game rounds, and N is the number of experts. It is also known that if we observe the advice of just one expert on every round, we can achieve regret of order O(√NT). Our open problem is what can be achieved by asking M experts on every round, where 1 < M < N.

[1] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[2] Jean-Yves Audibert,et al. Regret Bounds and Minimax Policies under Partial Monitoring , 2010, J. Mach. Learn. Res..

[3] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .

[4] John Langford,et al. Contextual Bandit Algorithms with Supervised Learning Guarantees , 2010, AISTATS.

[5] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[6] Ohad Shamir,et al. Efficient Learning with Partially Observed Attributes , 2010, ICML.

[7] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[8] Koby Crammer,et al. Prediction with Limited Advice and Multiarmed Bandits with Paid Observations , 2014, ICML.