A convergent O ( n ) algorithm for off-policy temporal-difference learning with linear function approximation
暂无分享,去创建一个
[1] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[2] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[3] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[4] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[5] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[6] Andrew Tridgell,et al. Experiments in Parameter Learning Using Temporal Differences , 1998, J. Int. Comput. Games Assoc..
[7] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[8] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[9] R. Sutton,et al. Off-policy Learning with Recognizers , 2000 .
[10] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[11] Sean P. Meyn,et al. The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..
[12] Jonathan Schaeffer,et al. Temporal Difference Learning Applied to a High-Performance Game-Playing Program , 2001, IJCAI.
[13] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[14] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[15] Peter Dayan,et al. The convergence of TD(λ) for general λ , 1992, Machine Learning.
[16] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[17] Vladislav Tadic,et al. On the Convergence of Temporal-Difference Learning with Linear Function Approximation , 2001, Machine Learning.
[18] Justin A. Boyan,et al. Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.
[19] Doina Precup,et al. Off-policy Learning with Options and Recognizers , 2005, NIPS.
[20] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[21] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[22] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[23] Richard S. Sutton,et al. Temporal Abstraction in Temporal-difference Networks , 2005, NIPS.
[24] Nathan R. Sturtevant,et al. Feature Construction for Reinforcement Learning in Hearts , 2006, Computers and Games.
[25] Alborz Geramifard,et al. Incremental Least-Squares Temporal Difference Learning , 2006, AAAI.
[26] Richard S. Sutton,et al. Reinforcement Learning of Local Shape in the Game of Go , 2007, IJCAI.