Implementation Matters in Deep RL: A Case Study on PPO and TRPO
暂无分享,去创建一个
Larry Rudolph | Aleksander Madry | Dimitris Tsipras | Logan Engstrom | Andrew Ilyas | Shibani Santurkar | Firdaus Janoos | A. Madry | L. Rudolph | Andrew Ilyas | Dimitris Tsipras | Shibani Santurkar | Logan Engstrom | F. Janoos
[1] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[2] Benjamin Recht,et al. Simple random search provides a competitive approach to reinforcement learning , 2018, ArXiv.
[3] Sergey Levine,et al. The Mirage of Action-Dependent Baselines in Reinforcement Learning , 2018, ICML.
[4] Sham M. Kakade,et al. Towards Generalization and Simplicity in Continuous Control , 2017, NIPS.
[5] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[6] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[7] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[8] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.
[9] Joelle Pineau,et al. Where Did My Optimum Go?: An Empirical Analysis of Gradient Descent Optimization in Policy Gradient Methods , 2018, ArXiv.
[10] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .
[11] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[12] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[13] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[14] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.