暂无分享,去创建一个
H. Francis Song | Martin A. Riedmiller | Abbas Abdolmaleki | Jost Tobias Springenberg | Nicolas Heess | Martin Riedmiller | Dhruva Tirumala | Arun Ahuja | Siqi Liu | Hubert Soyer | Aidan Clark | Jack W. Rae | Dan Belov | Seb Noury | Matthew M. Botvinick | N. Heess | Arun Ahuja | Siqi Liu | Dhruva Tirumala | A. Abdolmaleki | H. F. Song | M. Botvinick | Hubert Soyer | Seb Noury | Dan Belov | Aidan Clark | J. T. Springenberg
[1] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.
[2] Sergio Gomez Colmenarejo,et al. TF-Replicator: Distributed Machine Learning for Researchers , 2019, ArXiv.
[3] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.
[4] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[5] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[6] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.
[7] Jan Peters,et al. Fitted Q-iteration by Advantage Weighted Regression , 2008, NIPS.
[8] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[9] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.
[10] Rémi Munos,et al. Recurrent Experience Replay in Distributed Reinforcement Learning , 2018, ICLR.
[11] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[12] Yee Whye Teh,et al. Neural probabilistic motor primitives for humanoid control , 2018, ICLR.
[13] David Silver,et al. Learning functions across many orders of magnitudes , 2016, ArXiv.
[14] Guy Lever,et al. Human-level performance in 3D multiplayer games with population-based reinforcement learning , 2018, Science.
[15] Demis Hassabis,et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.
[16] Sergey Levine,et al. Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review , 2018, ArXiv.
[17] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[18] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.
[19] Yiming Zhang,et al. Supervised Policy Update for Deep Reinforcement Learning , 2018, ICLR.
[20] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[21] Yuval Tassa,et al. Maximum a Posteriori Policy Optimisation , 2018, ICLR.
[22] N. Hansen,et al. Convergence Properties of Evolution Strategies with the Derandomized Covariance Matrix Adaptation: T , 1997 .
[23] Geoffrey E. Hinton,et al. A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.
[24] Shie Mannor,et al. The Cross Entropy Method for Fast Policy Search , 2003, ICML.
[25] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[26] Yuval Tassa,et al. Relative Entropy Regularized Policy Iteration , 2018, ArXiv.
[27] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[28] Karen Simonyan,et al. Off-Policy Actor-Critic with Shared Experience Replay , 2020, ICML.
[29] Wojciech Czarnecki,et al. Multi-task Deep Reinforcement Learning with PopArt , 2018, AAAI.
[30] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[31] Andrew Zisserman,et al. Kickstarting Deep Reinforcement Learning , 2018, ArXiv.
[32] Luís Paulo Reis,et al. Deriving and improving CMA-ES with information geometric trust regions , 2017, GECCO.
[33] Alexandre M. Bayen,et al. Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines , 2018, ICLR.
[34] Nicolas Heess,et al. Hierarchical visuomotor control of humanoids , 2018, ICLR.
[35] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[36] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .
[37] Max Jaderberg,et al. Population Based Training of Neural Networks , 2017, ArXiv.
[38] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[39] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .