暂无分享,去创建一个
Marcin Andrychowicz | Wojciech Zaremba | Vikash Kumar | Peter Welinder | Josh Tobin | Jonas Schneider | Bob McGrew | Bowen Baker | Matthias Plappert | Maciek Chociej | Glenn Powell | Alex Ray | Marcin Andrychowicz | Vikash Kumar | Wojciech Zaremba | Bob McGrew | Jonas Schneider | P. Welinder | Joshua Tobin | Alex Ray | Matthias Plappert | Maciek Chociej | Glenn Powell | Bowen Baker
[1] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[2] Pieter Abbeel,et al. Equivalence Between Policy Gradients and Soft Q-Learning , 2017, ArXiv.
[3] Sergey Levine,et al. Temporal Difference Models: Model-Free Deep RL for Model-Based Control , 2018, ICLR.
[4] Yang Liu,et al. Learning to Play in a Day: Faster Deep Reinforcement Learning by Optimality Tightening , 2016, ICLR.
[5] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[6] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[7] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[8] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[9] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[10] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[11] Richard E. Turner,et al. Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning , 2017, NIPS.
[12] Kate Saenko,et al. Hierarchical Actor-Critic , 2017, ArXiv.
[13] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[14] Tom Schaul,et al. Universal Value Function Approximators , 2015, ICML.
[15] Pieter Abbeel,et al. Reverse Curriculum Generation for Reinforcement Learning , 2017, CoRL.
[16] Marcin Andrychowicz,et al. Parameter Space Noise for Exploration , 2017, ICLR.
[17] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[18] Filipe Wall Mutz,et al. Hindsight policy gradients , 2017, ICLR.
[19] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.