Learning Continuous Control Policies by Stochastic Value Gradients
暂无分享,去创建一个
Yuval Tassa | David Silver | Nicolas Heess | Tom Erez | Timothy P. Lillicrap | Gregory Wayne | T. Lillicrap | D. Silver | N. Heess | Greg Wayne | T. Erez | Yuval Tassa | David Silver | Tom Erez
[1] David Q. Mayne,et al. Differential dynamic programming , 1972, The Mathematical Gazette.
[2] B. Widrow,et al. Neural networks for self-learning control systems , 1990, IEEE Control Systems Magazine.
[3] Michael I. Jordan,et al. Forward Models: Supervised Learning with a Distal Teacher , 1992, Cogn. Sci..
[4] Michael I. Jordan,et al. Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.
[5] Richard S. Sutton,et al. A Menu of Designs for Reinforcement Learning Over Time , 1995 .
[6] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[7] Richard D. Braatz,et al. On the "Identification and control of dynamical systems using neural networks" , 1997, IEEE Trans. Neural Networks.
[8] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[9] Rémi Coulom,et al. Reinforcement Learning Using Neural Networks, with Applications to Motor Control. (Apprentissage par renforcement utilisant des réseaux de neurones, avec des applications au contrôle moteur) , 2002 .
[10] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[11] Longxin Lin. Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching , 2004, Machine Learning.
[12] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.
[13] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[14] Rémi Munos,et al. Policy Gradient in Continuous Time , 2006, J. Mach. Learn. Res..
[15] Pieter Abbeel,et al. Using inaccurate models in reinforcement learning , 2006, ICML.
[16] William D. Smart,et al. Receding Horizon Differential Dynamic Programming , 2007, NIPS.
[17] Pawel Wawrzynski,et al. A Cat-Like Robot Real-Time Learning to Run , 2009, ICANNGA.
[18] Pawel Wawrzynski,et al. Real-time reinforcement learning by sequential Actor-Critics and experience replay , 2009, Neural Networks.
[19] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.
[20] Michael Fairbank,et al. Value-gradient learning , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).
[21] Christopher G. Atkeson,et al. Efficient robust policy optimization , 2012, 2012 American Control Conference (ACC).
[22] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[23] Razvan Pascanu,et al. On the difficulty of training recurrent neural networks , 2012, ICML.
[24] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.
[25] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[26] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[27] Sergey Levine,et al. Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics , 2014, NIPS.
[28] Muhammad Ghifary,et al. Compatible Value Gradients for Reinforcement Learning of Continuous Deep Policies , 2015, ArXiv.
[29] I. Grondman,et al. Online Model Learning Algorithms for Actor-Critic Control , 2015 .
[30] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[31] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.