-
爱吃猫的鱼1于 2021年9月28日 18:02
Alec Radford | Prafulla Dhariwal | John Schulman | Filip Wolski | Oleg Klimov | Alec Radford | J. Schulman | F. Wolski | Prafulla Dhariwal | Oleg Klimov | John Schulman | Filip Wolski
[1] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[2] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[3] András Lörincz,et al. Learning Tetris Using the Noisy Cross-Entropy Method , 2006, Neural Computation.
[4] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[5] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[6] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[7] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[8] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[9] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.
[10] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[11] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[12] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[13] Nando de Freitas,et al. Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.
[14] Yuval Tassa,et al. Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.