论文信息 - An Unbiased, Data-Driven, Offline Evaluation Method of Contextual Bandit Algorithms - 字舞流文

An Unbiased, Data-Driven, Offline Evaluation Method of Contextual Bandit Algorithms

Wei Chu | John Langford | Lihong Li | J. Langford | Lihong Li | Wei Chu

[1] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[2] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.

[3] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[4] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[5] H. Vincent Poor,et al. Bandit problems with side observations , 2005, IEEE Transactions on Automatic Control.

[6] Yishay Mansour,et al. Approximate Planning in Large POMDPs via Reusable Trajectories , 1999, NIPS.

[7] H. Robbins,et al. Asymptotically efficient adaptive allocation rules , 1985 .

[8] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.

[9] M. Woodroofe. A One-Armed Bandit Problem with a Concomitant Variable , 1979 .

[10] J. Gittins. Bandit processes and dynamic allocation indices , 1979 .

[11] Bee-Chung Chen,et al. Explore/Exploit Schemes for Web Content Optimization , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[12] R. Agrawal. Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.

[13] Brian Tanner,et al. RL-Glue: Language-Independent Software for Reinforcement-Learning Experiments , 2009, J. Mach. Learn. Res..

[14] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[15] J. Langford,et al. The Epoch-Greedy algorithm for contextual multi-armed bandits , 2007, NIPS 2007.

[16] Donald A. Berry,et al. Bandit Problems: Sequential Allocation of Experiments. , 1986 .

[17] Chris Mesterharm,et al. Experience-efficient learning in associative bandit problems , 2006, ICML.

[18] Christopher J. Merz,et al. UCI Repository of Machine Learning Databases , 1996 .

[19] H. Robbins. Some aspects of the sequential design of experiments , 1952 .

[20] John Langford,et al. Exploration scavenging , 2008, ICML '08.

[21] Leslie Pack Kaelbling,et al. Associative Reinforcement Learning: Functions in k-DNF , 1994, Machine Learning.

[22] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.

[23] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[24] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .