Contextual Bandits with Linear Payoff Functions
暂无分享,去创建一个
Wei Chu | Lihong Li | Robert E. Schapire | Lev Reyzin | R. Schapire | Lihong Li | L. Reyzin | Wei Chu
[1] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[2] H. Robbins,et al. Asymptotically efficient adaptive allocation rules , 1985 .
[3] Leslie Pack Kaelbling,et al. Associative Reinforcement Learning: Functions in k-DNF , 1994, Machine Learning.
[4] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..
[5] Deepak Agarwal,et al. Online Models for Content Optimization , 2008, NIPS.
[6] Chris Mesterharm,et al. Experience-efficient learning in associative bandit problems , 2006, ICML.
[7] Dimitris K. Tasoulis,et al. Simulation Studies of Multi-armed Bandits with Covariates (Invited Paper) , 2008, Tenth International Conference on Computer Modeling and Simulation (uksim 2008).
[8] J. Langford,et al. The Epoch-Greedy algorithm for contextual multi-armed bandits , 2007, NIPS 2007.
[9] J. Sarkar. One-Armed Bandit Problems with Covariates , 1991 .
[10] Philip M. Long,et al. Reinforcement Learning with Immediate Rewards and Linear Hypotheses , 2003, Algorithmica.
[11] M. Woodroofe. A One-Armed Bandit Problem with a Concomitant Variable , 1979 .
[12] Thomas J. Walsh,et al. Exploring compact reinforcement-learning representations with linear regression , 2009, UAI.
[13] Thomas P. Hayes,et al. Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.
[14] R. Agrawal. Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.
[15] John N. Tsitsiklis,et al. Linearly Parameterized Bandits , 2008, Math. Oper. Res..
[16] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.
[17] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.