暂无分享,去创建一个
[1] Chuanyu Yang,et al. Recurrent Deterministic Policy Gradient Method for Bipedal Locomotion on Rough Terrain Challenge , 2017, 2018 15th International Conference on Control, Automation, Robotics and Vision (ICARCV).
[2] Xiaoxiao Guo. Deep Learning and Reward Design for Reinforcement Learning , 2017 .
[3] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.
[4] Ruben Villegas,et al. Learning Latent Dynamics for Planning from Pixels , 2018, ICML.
[5] J. Andrew Bagnell,et al. Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy , 2010 .
[6] Rémi Munos,et al. Recurrent Experience Replay in Distributed Reinforcement Learning , 2018, ICLR.
[7] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[8] Sergey Levine,et al. Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic , 2016, ICLR.
[9] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[10] Jürgen Schmidhuber,et al. World Models , 2018, ArXiv.
[11] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[12] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[13] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[14] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.
[15] David Budden,et al. Distributed Prioritized Experience Replay , 2018, ICLR.
[16] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.