暂无分享,去创建一个
Matthew W. Hoffman | David Budden | Nicolas Heess | Timothy P. Lillicrap | Gabriel Barth-Maron | Will Dabney | Dan Horgan | TB Dhruva | Alistair Muldal | Dan Horgan | D. Budden | T. Lillicrap | N. Heess | Gabriel Barth-Maron | Will Dabney | TB Dhruva | Alistair Muldal
[1] R. Mazo. On the theory of brownian motion , 1973 .
[2] M. J. Sobel. The variance of discounted Markov decision processes , 1982 .
[3] Peter W. Glynn,et al. Likelihood ratio gradient estimation for stochastic systems , 1990, CACM.
[4] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[5] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[6] Masashi Sugiyama,et al. Parametric Return Density Estimation for Reinforcement Learning , 2010, UAI.
[7] Martin A. Riedmiller,et al. Reinforcement learning in feedback control , 2011, Machine Learning.
[8] Stuart D. Harshbarger,et al. An Overview of the Developmental Process for the Modular Prosthetic Limb , 2011 .
[9] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[10] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[11] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[12] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[13] Vikash Kumar,et al. MuJoCo HAPTIX: A virtual reality system for hand manipulation , 2015, 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids).
[14] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[15] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[16] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[17] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[18] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[19] Yuval Tassa,et al. Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.
[20] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[21] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[22] Yuval Tassa,et al. Data-efficient Deep Reinforcement Learning for Dexterous Manipulation , 2017, ArXiv.
[23] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[24] David Budden,et al. Distributed Prioritized Experience Replay , 2018, ICLR.