Open Problem: Regret Bounds for Thompson Sampling
暂无分享,去创建一个
[1] Shipra Agrawal,et al. Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.
[2] Lihong Li,et al. An Empirical Evaluation of Thompson Sampling , 2011, NIPS.
[3] Benedict C. May. Simulation Studies in Optimistic Bayesian Sampling in Contextual-Bandit Problems , 2011 .
[4] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[5] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .
[6] John Langford,et al. Efficient Optimal Learning for Contextual Bandits , 2011, UAI.
[7] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[8] John Langford,et al. Contextual Bandit Learning with Predictable Rewards , 2012, AISTATS.
[9] Ole-Christoffer Granmo,et al. Solving two-armed Bernoulli bandit problems using a Bayesian learning automaton , 2010, Int. J. Intell. Comput. Cybern..
[10] Steven L. Scott,et al. A modern Bayesian look at the multi-armed bandit , 2010 .
[11] J. Langford,et al. The Epoch-Greedy algorithm for contextual multi-armed bandits , 2007, NIPS 2007.
[12] David S. Leslie,et al. Optimistic Bayesian Sampling in Contextual-Bandit Problems , 2012, J. Mach. Learn. Res..
[13] Joaquin Quiñonero Candela,et al. Web-Scale Bayesian Click-Through rate Prediction for Sponsored Search Advertising in Microsoft's Bing Search Engine , 2010, ICML.