暂无分享,去创建一个
[1] Steven L. Scott,et al. A modern Bayesian look at the multi-armed bandit , 2010 .
[2] Joaquin Quiñonero Candela,et al. Web-Scale Bayesian Click-Through rate Prediction for Sponsored Search Advertising in Microsoft's Bing Search Engine , 2010, ICML.
[3] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[4] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .
[5] Günther Palm,et al. Value-Difference Based Exploration: Adaptive Control between Epsilon-Greedy and Softmax , 2011, KI.
[6] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[7] Doina Precup,et al. Algorithms for multi-armed bandit problems , 2014, ArXiv.
[8] Patrick Jaillet,et al. Real-Time Bidding with Side Information , 2017, NIPS.
[9] N. Gatti,et al. Multi – Armed Bandit for Pricing , 2015 .
[10] Lihong Li,et al. An Empirical Evaluation of Thompson Sampling , 2011, NIPS.
[11] Omar Besbes,et al. Optimal Exploration-Exploitation in a Multi-Armed-Bandit Problem with Non-Stationary Rewards , 2014, Stochastic Systems.
[12] Aurélien Garivier,et al. On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems , 2008, 0805.3415.
[13] Shipra Agrawal,et al. Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.