Linear Upper Confidence Bound Algorithm for Contextual Bandit Problem with Piled Rewards
暂无分享,去创建一个
[1] Sudipto Guha,et al. Multiarmed Bandit Problems with Delayed Feedback , 2010, 1011.1161.
[2] András György,et al. Online Learning under Delayed Feedback , 2013, ICML.
[3] Wei Chu,et al. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.
[4] Zoran Popovic,et al. The Queue Method: Handling Delay, Heuristics, Prior Data, and Evaluation in Bandits , 2015, AAAI.
[5] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..
[6] John Langford,et al. Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits , 2014, ICML.
[7] Hsuan-Tien Lin,et al. Pseudo-reward Algorithms for Contextual Bandits with Linear Payoff Functions , 2014, ACML.
[8] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[9] H. Vincent Poor,et al. Bandit problems with side observations , 2005, IEEE Transactions on Automatic Control.
[10] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.
[11] John Langford,et al. Doubly Robust Policy Evaluation and Learning , 2011, ICML.
[12] John Langford,et al. Efficient Optimal Learning for Contextual Bandits , 2011, UAI.
[13] Wei Chu,et al. Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.