Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation
暂无分享,去创建一个
Shalabh Bhatnagar | Doina Precup | Richard S. Sutton | Csaba Szepesvári | David Silver | Hamid Reza Maei | R. Sutton | D. Silver | H. Maei | Csaba Szepesvari | S. Bhatnagar | Doina Precup | David Silver
[1] Thomas L. Griffiths,et al. Advances in Neural Information Processing Systems 21 , 1993, NIPS 2009.
[2] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.
[3] Barak A. Pearlmutter. Fast Exact Multiplication by the Hessian , 1994, Neural Computation.
[4] Andrew G. Barto,et al. Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.
[5] Wei Zhang,et al. A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.
[6] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[7] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[8] Harold J. Kushner,et al. Stochastic Approximation Algorithms and Applications , 1997, Applications of Mathematics.
[9] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[10] Sean P. Meyn,et al. The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..
[11] Gerald Tesauro,et al. Practical issues in temporal difference learning , 1992, Machine Learning.
[12] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[13] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[14] Csaba Szepesvári,et al. Learning Near-Optimal Policies with Bellman-Residual Minimization Based Fitted Policy Iteration and a Single Sample Path , 2006, COLT.
[15] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[16] R. Sutton,et al. A convergent O ( n ) algorithm for off-policy temporal-difference learning with linear function approximation , 2008, NIPS 2008.
[17] Shie Mannor,et al. Regularized Policy Iteration , 2008, NIPS.
[18] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[19] David Silver,et al. Reinforcement learning and simulation-based search in computer go , 2009 .