Forward Actor-Critic for Nonlinear Function Approximation in Reinforcement Learning
暂无分享,去创建一个
[1] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[2] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[3] C. Watkins. Learning from delayed rewards , 1989 .
[4] Pawel Cichosz,et al. Truncating Temporal Differences: On the Efficient Implementation of TD(lambda) for Reinforcement Learning , 1994, J. Artif. Intell. Res..
[5] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[6] Pawea Cichosz. Truncating Temporal Diierences: on the Eecient Implementation of Td for Reinforcement Learning , 1995 .
[7] David K. Smith,et al. Dynamic Programming and Optimal Control. Volume 1 , 1996 .
[8] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[9] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[10] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[11] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[12] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[13] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[14] Sergey Levine,et al. Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.
[15] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.
[16] Sergey Levine,et al. Learning deep neural network policies with continuous memory states , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).
[17] Harm van Seijen,et al. Effective Multi-step Temporal-Difference Learning for Non-Linear Function Approximation , 2016, ArXiv.
[18] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.