Speedy Q-Learning
暂无分享,去创建一个
Hilbert J. Kappen | Rémi Munos | Mohammad Ghavamzadeh | Mohammad Gheshlaghi Azar | H. Kappen | R. Munos | M. Ghavamzadeh | M. G. Azar
[1] William Feller,et al. An Introduction to Probability Theory and Its Applications , 1951 .
[2] B. Harshbarger. An Introduction to Probability Theory and its Applications, Volume I , 1958 .
[3] William Feller,et al. An Introduction to Probability Theory and Its Applications , 1967 .
[4] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Vol. II , 1976 .
[5] C. Watkins. Learning from delayed rewards , 1989 .
[6] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[7] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[8] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[9] Csaba Szepesvári,et al. The Asymptotic Convergence-Rate of Q-learning , 1997, NIPS.
[10] Michael Kearns,et al. Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms , 1998, NIPS.
[11] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[12] Yishay Mansour,et al. Learning Rates for Q-learning , 2004, J. Mach. Learn. Res..
[13] Shie Mannor,et al. PAC Bounds for Multi-armed Bandit and Markov Decision Processes , 2002, COLT.
[14] Jing Peng,et al. Incremental multi-step Q-learning , 1994, Machine Learning.
[15] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[16] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .
[17] Shie Mannor,et al. Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems , 2006, J. Mach. Learn. Res..
[18] Csaba Szepesvári,et al. Fitted Q-iteration in continuous action-space MDPs , 2007, NIPS.
[19] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[20] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..
[21] Lihong Li,et al. Reinforcement Learning in Finite MDPs: PAC Analysis , 2009, J. Mach. Learn. Res..
[22] Ambuj Tewari,et al. REGAL: A Regularization based Algorithm for Reinforcement Learning in Weakly Communicating MDPs , 2009, UAI.
[23] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.
[24] Csaba Szepesvári,et al. Model-based reinforcement learning with nearly tight exploration complexity bounds , 2010, ICML.
[25] H. Kappen,et al. Reinforcement Learning with a Near Optimal Rate of Convergence , 2011 .