暂无分享,去创建一个
Alexandre M. Bayen | Pieter Abbeel | Sham M. Kakade | Vikash Kumar | Aravind Rajeswaran | Igor Mordatch | Yan Duan | Cathy Wu | P. Abbeel | S. Kakade | Yan Duan | Vikash Kumar | Igor Mordatch | A. Rajeswaran | Cathy Wu | A. Bayen
[1] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[2] Lex Weaver,et al. The Optimal Reward Baseline for Gradient-Based Reinforcement Learning , 2001, UAI.
[3] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[4] Peter L. Bartlett,et al. Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..
[5] E. Todorov,et al. Analysis of the synergies underlying complex hand manipulation , 2004, The 26th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.
[6] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[7] Emanuel Todorov,et al. From task parameters to motor synergies: A hierarchical framework for approximately optimal control of redundant manipulators , 2005, J. Field Robotics.
[8] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[9] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.
[10] P. Faloutsos,et al. Motion Editing With Independent Component Analysis , 2009 .
[11] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[12] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[13] Zoran Popovic,et al. Interactive Control of Diverse Complex Characters with Neural Networks , 2015, NIPS.
[14] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[15] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[16] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.
[17] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[18] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[19] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..
[20] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[21] Sergey Levine,et al. Guided Policy Search via Approximate Mirror Descent , 2016, NIPS.
[22] Sergey Levine,et al. Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic , 2016, ICLR.
[23] Yi Wu,et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.
[24] Sham M. Kakade,et al. Towards Generalization and Simplicity in Continuous Control , 2017, NIPS.
[25] Shimon Whiteson,et al. Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.
[26] Sergey Levine,et al. The Mirage of Action-Dependent Baselines in Reinforcement Learning , 2018, ICML.
[27] Sergey Levine,et al. Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations , 2017, Robotics: Science and Systems.
[28] Ching-An Cheng,et al. Trajectory-wise Control Variates for Variance Reduction in Policy Gradient Methods , 2019, CoRL.