论文信息 - GQ(lambda): A general gradient algorithm for temporal-difference prediction learning with eligibility traces - 字舞流文

GQ(lambda): A general gradient algorithm for temporal-difference prediction learning with eligibility traces

Richard S. Sutton | Hamid Reza Maei | R. Sutton | H. Maei

[1] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.

[2] R. Sutton,et al. A convergent O ( n ) algorithm for off-policy temporal-difference learning with linear function approximation , 2008, NIPS 2008.

[3] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[4] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[5] Doina Precup,et al. Intra-Option Learning about Temporally Abstract Actions , 1998, ICML.

[6] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.

[7] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[8] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.