A theoretical and empirical analysis of Expected Sarsa
暂无分享,去创建一个
Shimon Whiteson | Marco Wiering | Harm van Seijen | Hado van Hasselt | S. Whiteson | H. V. Hasselt | M. Wiering | H. V. Seijen | Shimon Whiteson
[1] R. Bellman. A Markovian Decision Process , 1957 .
[2] R. Bellman. Dynamic programming. , 1957, Science.
[3] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.
[4] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[5] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[6] Richard S. Sutton,et al. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.
[7] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[8] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[9] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[10] Tommi S. Jaakkola,et al. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.
[11] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[12] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.