暂无分享,去创建一个
[1] Benjamin Van Roy,et al. Learning to Optimize via Posterior Sampling , 2013, Math. Oper. Res..
[2] Shipra Agrawal,et al. Further Optimal Regret Bounds for Thompson Sampling , 2012, AISTATS.
[3] Lihong Li,et al. An Empirical Evaluation of Thompson Sampling , 2011, NIPS.
[4] Shipra Agrawal,et al. Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.
[5] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[6] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[7] Rémi Munos,et al. Pure Exploration in Multi-armed Bandits Problems , 2009, ALT.
[8] Rémi Munos,et al. Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis , 2012, ALT.