Trust Region-Guided Proximal Policy Optimization
暂无分享,去创建一个
Hao He | Xiaoyang Tan | Yuhui Wang | Yaozhong Gan | Xiaoyang Tan | Yuhui Wang | Yaozhong Gan | Hao He
[1] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[2] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[3] Alexander J. Smola,et al. P3O: Policy-on Policy-off Policy Optimization , 2019, UAI.
[4] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[5] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.
[6] Demis Hassabis,et al. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.
[7] Elman Mansimov,et al. Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation , 2017, NIPS.
[8] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[9] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .
[10] Gang Chen,et al. An Adaptive Clipping Approach for Proximal Policy Optimization , 2018, ArXiv.
[11] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[12] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[13] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[14] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[15] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[16] Shane Legg,et al. Noisy Networks for Exploration , 2017, ICLR.
[17] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..
[18] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[19] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[20] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[21] Hao He,et al. Truly Proximal Policy Optimization , 2019, UAI.
[22] Anind K. Dey,et al. Modeling Interaction via the Principle of Maximum Causal Entropy , 2010, ICML.
[23] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.
[24] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.