论文信息 - A note on the Bayesian regret of Thompson Sampling with an arbitrary prior

A note on the Bayesian regret of Thompson Sampling with an arbitrary prior

We consider the stochastic multi-armed bandit problem with a prior distribution on the reward distributions. We show that for any prior distribution, the Thompson Sampling strategy achieves a Bayesian regret bounded from above by $14 \sqrt{n K}$. This result is unimprovable in the sense that there exists a prior distribution such that any algorithm has a Bayesian regret bounded from below by $1/20 \sqrt{n K}$.

Sébastien Bubeck | Che-Yu Liu | Sébastien Bubeck | Che-Yu Liu

[1] Benjamin Van Roy,et al. Learning to Optimize via Posterior Sampling , 2013, Math. Oper. Res..

[2] Shipra Agrawal,et al. Further Optimal Regret Bounds for Thompson Sampling , 2012, AISTATS.

[3] Lihong Li,et al. An Empirical Evaluation of Thompson Sampling , 2011, NIPS.

[4] Shipra Agrawal,et al. Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.

[5] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[6] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[7] Rémi Munos,et al. Pure Exploration in Multi-armed Bandits Problems , 2009, ALT.

[8] Rémi Munos,et al. Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis , 2012, ALT.