论文信息 - A stochastic bandit algorithm for scratch games

A stochastic bandit algorithm for scratch games

Stochastic multi-armed bandit algorithms are used to solve the exploration and exploitation dilemma in sequential optimization problems. The algorithms based on upper condence bounds oer strong theoretical guarantees, they are easy to implement and ecient in practice. We considers a new bandit setting, called \scratch-games", where arm budgets are limited and reward are drawn without replacement. Using Sering inequality, we propose an upper condence bound algorithm adapted to this setting. We show that the bound of expectation to play a suboptimal arm is lower than the one of UCB1 policy. We illustrate this result on both synthetic problems and realistic problems (ad-serving and emailing campaigns optimization).

Raphaël Féraud | Tanguy Urvoy

[1] Csaba Szepesvári,et al. Exploration-exploitation tradeoff using variance estimates in multi-armed bandits , 2009, Theor. Comput. Sci..

[2] Deepayan Chakrabarti,et al. Bandits for Taxonomies: A Model-based Approach , 2007, SDM.

[3] R. Serfling. Probability Inequalities for the Sum in Sampling without Replacement , 1974 .

[4] Aurélien Garivier,et al. The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond , 2011, COLT.

[5] Thomas P. Hayes,et al. Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.

[6] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[7] Csaba Szepesvári,et al. Online Optimization in X-Armed Bandits , 2008, NIPS.

[8] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.

[9] R. Agrawal. Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.

[10] Aurélien Garivier,et al. On Bayesian Upper Confidence Bounds for Bandit Problems , 2012, AISTATS.

[11] Filip Radlinski,et al. Mortal Multi-Armed Bandits , 2008, NIPS.

[12] Deepayan Chakrabarti,et al. Multi-armed bandit problems with dependent arms , 2007, ICML '07.

[13] H. Robbins,et al. Asymptotically efficient adaptive allocation rules , 1985 .