Gradient temporal-difference learning algorithms
暂无分享,去创建一个
[1] R. Bellman,et al. V. Adaptive Control Processes , 1964 .
[2] Lennart Ljung,et al. Analysis of recursive stochastic algorithms , 1977 .
[3] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[4] C. Watkins. Learning from delayed rewards , 1989 .
[5] Pierre Priouret,et al. Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.
[6] G. Tesauro. Practical Issues in Temporal Difference Learning , 1992 .
[7] Richard S. Sutton,et al. Adapting Bias by Gradient Descent: An Incremental Version of Delta-Bar-Delta , 1992, AAAI.
[8] Etienne Barnard,et al. Temporal-difference methods and Markov models , 1993, IEEE Trans. Syst. Man Cybern..
[9] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.
[10] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.
[11] Michael I. Jordan,et al. Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.
[12] Barak A. Pearlmutter. Fast Exact Multiplication by the Hessian , 1994, Neural Computation.
[13] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[14] Andrew G. Barto,et al. Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.
[15] Richard S. Sutton,et al. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.
[16] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[17] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[18] B. Delyon. General results on the convergence of stochastic algorithms , 1996, IEEE Trans. Autom. Control..
[19] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.
[20] John N. Tsitsiklis,et al. Reinforcement Learning for Call Admission Control and Routing in Integrated Service Networks , 1997, NIPS.
[21] Harold J. Kushner,et al. Stochastic Approximation Algorithms and Applications , 1997, Applications of Mathematics.
[22] V. Borkar. Stochastic approximation with two time scales , 1997 .
[23] Doina Precup,et al. Intra-Option Learning about Temporally Abstract Actions , 1998, ICML.
[24] Andrew Tridgell,et al. Experiments in Parameter Learning Using Temporal Differences , 1998, J. Int. Comput. Games Assoc..
[25] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[26] L. Baird. Reinforcement Learning Through Gradient Descent , 1999 .
[27] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[28] Stuart J. Russell,et al. Convergence of Reinforcement Learning with General Function Approximators , 1999, IJCAI.
[29] R. Sutton,et al. Off-policy Learning with Recognizers , 2000 .
[30] Geoffrey J. Gordon. Reinforcement Learning with Function Approximation Converges to a Region , 2000, NIPS.
[31] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[32] Sean P. Meyn,et al. The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..
[33] Jonathan Schaeffer,et al. Temporal Difference Learning Applied to a High-Performance Game-Playing Program , 2001, IJCAI.
[34] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[35] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[36] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[37] Vladislav Tadic,et al. On the Convergence of Temporal-Difference Learning with Linear Function Approximation , 2001, Machine Learning.
[38] Justin A. Boyan,et al. Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.
[39] William D. Smart,et al. Interpolation-based Q-learning , 2004, ICML.
[40] Doina Precup,et al. Off-policy Learning with Options and Recognizers , 2005, NIPS.
[41] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[42] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[43] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[44] Alborz Geramifard,et al. Incremental Least-Squares Temporal Difference Learning , 2006, AAAI.
[45] Csaba Szepesvári,et al. Fitted Q-iteration in continuous action-space MDPs , 2007, NIPS.
[46] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[47] Richard S. Sutton,et al. Reinforcement Learning of Local Shape in the Game of Go , 2007, IJCAI.
[48] R. Sutton,et al. A convergent O ( n ) algorithm for off-policy temporal-difference learning with linear function approximation , 2008, NIPS 2008.
[49] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008, Texts and Readings in Mathematics.
[50] Shalabh Bhatnagar,et al. Natural actor-critic algorithms , 2009, Autom..
[51] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[52] Shalabh Bhatnagar,et al. Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation , 2009, NIPS.
[53] David Silver,et al. Reinforcement learning and simulation-based search in computer go , 2009 .
[54] R. Sutton. The Grand Challenge of Predictive Empirical Abstract Knowledge , 2009 .
[55] Bruno Scherrer,et al. Should one compute the Temporal Difference fix point or minimize the Bellman Residual? The unified oblique projection view , 2010, ICML.
[56] Ashique Mahmood. Automatic step-size adaptation in incremental supervised learning , 2010 .
[57] Shalabh Bhatnagar,et al. Toward Off-Policy Learning Control with Function Approximation , 2010, ICML.
[58] Sanjoy Dasgupta,et al. Adaptive Control Processes , 2010, Encyclopedia of Machine Learning and Data Mining.
[59] Richard S. Sutton,et al. GQ(lambda): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010, Artificial General Intelligence.
[60] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[61] R. Sutton,et al. GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010 .
[62] Patrick M. Pilarski,et al. Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.