暂无分享,去创建一个
Wei Chu | John Langford | Lihong Li | J. Langford | Lihong Li | Wei Chu
[1] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..
[2] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.
[3] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[4] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[5] H. Vincent Poor,et al. Bandit problems with side observations , 2005, IEEE Transactions on Automatic Control.
[6] Yishay Mansour,et al. Approximate Planning in Large POMDPs via Reusable Trajectories , 1999, NIPS.
[7] H. Robbins,et al. Asymptotically efficient adaptive allocation rules , 1985 .
[8] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[9] M. Woodroofe. A One-Armed Bandit Problem with a Concomitant Variable , 1979 .
[10] J. Gittins. Bandit processes and dynamic allocation indices , 1979 .
[11] Bee-Chung Chen,et al. Explore/Exploit Schemes for Web Content Optimization , 2009, 2009 Ninth IEEE International Conference on Data Mining.
[12] R. Agrawal. Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.
[13] Brian Tanner,et al. RL-Glue: Language-Independent Software for Reinforcement-Learning Experiments , 2009, J. Mach. Learn. Res..
[14] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[15] J. Langford,et al. The Epoch-Greedy algorithm for contextual multi-armed bandits , 2007, NIPS 2007.
[16] Donald A. Berry,et al. Bandit Problems: Sequential Allocation of Experiments. , 1986 .
[17] Chris Mesterharm,et al. Experience-efficient learning in associative bandit problems , 2006, ICML.
[18] Christopher J. Merz,et al. UCI Repository of Machine Learning Databases , 1996 .
[19] H. Robbins. Some aspects of the sequential design of experiments , 1952 .
[20] John Langford,et al. Exploration scavenging , 2008, ICML '08.
[21] Leslie Pack Kaelbling,et al. Associative Reinforcement Learning: Functions in k-DNF , 1994, Machine Learning.
[22] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[23] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.
[24] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .