暂无分享,去创建一个
Sergey Levine | Ilya Sutskever | Shixiang Gu | Timothy P. Lillicrap | S. Levine | S. Gu | T. Lillicrap | Ilya Sutskever | I. Sutskever
[1] Jürgen Schmidhuber,et al. Reinforcement Learning in Markovian and Non-Markovian Environments , 1990, NIPS.
[2] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[3] Richard S. Sutton,et al. The Truck Backer-Upper: An Example of Self-Learning in Neural Networks , 1995 .
[4] Mance E. Harmon,et al. Multi-Agent Residual Advantage Learning with General Function Approximation. , 1996 .
[5] Mark Harmon. Multi-player residual advantage learning with general function , 1996 .
[6] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[7] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[8] J. Tsitsiklis,et al. Actor-citic agorithms , 1999, NIPS 1999.
[9] Emanuel Todorov,et al. Iterative Linear Quadratic Regulator Design for Nonlinear Biological Movement Systems , 2004, ICINCO.
[10] Andrew W. Moore,et al. Locally Weighted Learning for Control , 1997, Artificial Intelligence Review.
[11] Stefan Schaal,et al. Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[12] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .
[13] Marc Toussaint,et al. Approximate Inference and Stochastic Optimal Control , 2010, ArXiv.
[14] Martin A. Riedmiller,et al. Reinforcement learning in feedback control , 2011, Machine Learning.
[15] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.
[16] Marc Toussaint,et al. On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference , 2012, Robotics: Science and Systems.
[17] Yuval Tassa,et al. Synthesis and stabilization of complex behaviors through online trajectory optimization , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[18] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[19] Sergey Levine,et al. Guided Policy Search , 2013, ICML.
[20] Jürgen Schmidhuber,et al. Evolving large-scale neural networks for vision-based reinforcement learning , 2013, GECCO '13.
[21] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..
[22] Jan Peters,et al. A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.
[23] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[24] Sergey Levine,et al. Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics , 2014, NIPS.
[25] Martin A. Riedmiller,et al. Approximate model-assisted Neural Fitted Q-Iteration , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).
[26] Thomas B. Schön,et al. From Pixels to Torques: Policy Learning with Deep Dynamical Models , 2015, ICML 2015.
[27] Yuval Tassa,et al. Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.
[28] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[29] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[30] Martin A. Riedmiller,et al. Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.
[31] Karl Tuyls,et al. The importance of experience replay database composition in deep reinforcement learning , 2015 .
[32] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[33] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[34] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[35] Sergey Levine,et al. One-shot learning of manipulation skills with online dynamics adaptation and neural network priors , 2015, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[36] Peter Stone,et al. Deep Reinforcement Learning in Parameterized Action Space , 2015, ICLR.
[37] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..
[38] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[39] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[40] Tapani Raiko,et al. International Conference on Learning Representations (ICLR) , 2016 .