Multi-step Reinforcement Learning: A Unifying Algorithm
暂无分享,去创建一个
Richard S. Sutton | J. Fernando Hernandez-Garcia | G. Zacharias Holland | Kristopher De Asis | R. Sutton | J. Hernandez-Garcia | G. Holland | Zach Holland | Fernando Hernandez-Garcia
[1] C. Watkins. Learning from delayed rewards , 1989 .
[2] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[3] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[4] Gavin Adrian Rummery. Problem solving with reinforcement learning , 1995 .
[5] Richard S. Sutton,et al. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.
[6] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[7] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[8] Tommi S. Jaakkola,et al. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.
[9] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[10] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[11] Shimon Whiteson,et al. A theoretical and empirical analysis of Expected Sarsa , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.