暂无分享,去创建一个
[1] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.
[2] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .
[3] Sham M. Kakade,et al. Towards Generalization and Simplicity in Continuous Control , 2017, NIPS.
[4] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.
[5] Masashi Sugiyama,et al. Guide Actor-Critic for Continuous Control , 2017, ICLR.
[6] Benjamin Recht,et al. Simple random search provides a competitive approach to reinforcement learning , 2018, ArXiv.
[7] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[8] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[9] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[10] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[11] Jakub W. Pachocki,et al. Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..
[12] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[13] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[14] Nando de Freitas,et al. Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.
[15] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[16] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[17] Quoc V. Le,et al. Neural Architecture Search with Reinforcement Learning , 2016, ICLR.
[18] Martha White,et al. Linear Off-Policy Actor-Critic , 2012, ICML.
[19] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[20] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[21] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .
[22] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[23] Pieter Abbeel,et al. Constrained Policy Optimization , 2017, ICML.
[24] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[25] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[26] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[27] Pieter Abbeel,et al. Equivalence Between Policy Gradients and Soft Q-Learning , 2017, ArXiv.
[28] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[29] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[30] Elman Mansimov,et al. Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation , 2017, NIPS.
[31] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.