暂无分享,去创建一个
Henry Zhu | Sergey Levine | Sehoon Ha | Pieter Abbeel | Abhishek Gupta | Jie Tan | Vikash Kumar | George Tucker | Aurick Zhou | Tuomas Haarnoja | Kristian Hartikainen | S. Levine | P. Abbeel | G. Tucker | Abhishek Gupta | Tuomas Haarnoja | Vikash Kumar | Sehoon Ha | Jie Tan | Aurick Zhou | Kristian Hartikainen | Henry Zhu
[1] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[2] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[3] H. Kappen. Path integrals and symmetry breaking for optimal control theory , 2005, physics/0505066.
[4] Emanuel Todorov,et al. Linearly-solvable Markov decision problems , 2006, NIPS.
[5] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.
[6] Emanuel Todorov,et al. General duality between optimal control and estimation , 2008, 2008 47th IEEE Conference on Decision and Control.
[7] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .
[8] Marc Toussaint,et al. Robot trajectory optimization using approximate inference , 2009, ICML '09.
[9] Shalabh Bhatnagar,et al. Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation , 2009, NIPS.
[10] J. Andrew Bagnell,et al. Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy , 2010 .
[11] Marc Toussaint,et al. On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference , 2012, Robotics: Science and Systems.
[12] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[13] Sergey Levine,et al. Guided Policy Search , 2013, ICML.
[14] Philip Thomas,et al. Bias in Natural Actor-Critic Algorithms , 2014, ICML.
[15] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[16] Yuval Tassa,et al. Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.
[17] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[18] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[19] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[20] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[21] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.
[22] Roy Fox,et al. Taming the Noise in Reinforcement Learning via Soft Updates , 2015, UAI.
[23] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[24] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[25] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..
[26] Daniel E. Koditschek,et al. Design Principles for a Family of Direct-Drive Legged Robots , 2016, IEEE Robotics and Automation Letters.
[27] Koray Kavukcuoglu,et al. PGQ: Combining policy gradient and Q-learning , 2016, ArXiv.
[28] Marc G. Bellemare,et al. The Reactor: A Sample-Efficient Actor-Critic Architecture , 2017, ArXiv.
[29] Sergey Levine,et al. Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic , 2016, ICLR.
[30] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.
[31] Dale Schuurmans,et al. Bridging the Gap Between Value and Policy Based Reinforcement Learning , 2017, NIPS.
[32] Pieter Abbeel,et al. Equivalence Between Policy Gradients and Soft Q-Learning , 2017, ArXiv.
[33] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[34] Sergey Levine,et al. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).
[35] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.
[36] Atil Iscen,et al. Sim-to-Real: Learning Agile Locomotion For Quadruped Robots , 2018, Robotics: Science and Systems.
[37] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[38] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[39] Yuval Tassa,et al. Maximum a Posteriori Policy Optimisation , 2018, ICLR.
[40] Dale Schuurmans,et al. Trust-PCL: An Off-Policy Trust Region Method for Continuous Control , 2017, ICLR.
[41] Sergey Levine,et al. Latent Space Policies for Hierarchical Reinforcement Learning , 2018, ICML.
[42] Sergey Levine,et al. Composable Deep Reinforcement Learning for Robotic Manipulation , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[43] Marc G. Bellemare,et al. The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning , 2017, ICLR.
[44] Henry Zhu,et al. Dexterous Manipulation with Deep Reinforcement Learning: Efficient, General, and Low-Cost , 2018, 2019 International Conference on Robotics and Automation (ICRA).