论文信息 - A simple multi-armed bandit algorithm with optimal variation-bounded regret

A simple multi-armed bandit algorithm with optimal variation-bounded regret

We pose the question of whether it is possible to design a simple, linear-time algorithm for the basic multi-armed bandit problem in the adversarial setting which has a regret bound of O( p Q logT ), where Q is the total quadratic variation of all the arms.

Elad Hazan | Satyen Kale | Elad Hazan | Satyen Kale

[1] Yishay Mansour,et al. Improved second-order bounds for prediction with expert advice , 2006, Machine Learning.

[2] Elad Hazan,et al. Better Algorithms for Benign Bandits , 2009, J. Mach. Learn. Res..

[3] Elad Hazan,et al. On Stochastic and Worst-case Models for Investing , 2009, NIPS.

[4] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[5] Elad Hazan,et al. Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization , 2008, COLT.

[6] Elad Hazan,et al. Extracting certainty from uncertainty: regret bounded by variation in costs , 2008, Machine Learning.