The Multi-Armed Bandit Problem under Delayed Rewards Conditions in Digital Campaign Management
暂无分享,去创建一个
[1] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[2] Andreas Krause,et al. Parallelizing Exploration-Exploitation Tradeoffs with Gaussian Process Bandit Optimization , 2012, ICML.
[3] Aurélien Garivier,et al. On Bayesian Upper Confidence Bounds for Bandit Problems , 2012, AISTATS.
[4] R. Agrawal. Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.
[5] András György,et al. Online Learning under Delayed Feedback , 2013, ICML.
[6] Miguel Martín,et al. Possibilistic reward methods for the multi-armed bandit problem , 2018, Neurocomputing.
[7] John C. Duchi,et al. Distributed delayed stochastic optimization , 2011, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).
[8] Aurélien Garivier,et al. The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond , 2011, COLT.
[9] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[10] John Langford,et al. Efficient Optimal Learning for Contextual Bandits , 2011, UAI.
[11] Akimichi Takemura,et al. An Asymptotically Optimal Bandit Algorithm for Bounded Support Models. , 2010, COLT 2010.