The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
暂无分享,去创建一个
Marc G. Bellemare | Rémi Munos | Bilal Piot | Mohammad Gheshlaghi Azar | Will Dabney | Audrunas Gruslys | R. Munos | Will Dabney | Bilal Piot | M. G. Azar | A. Gruslys
[1] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[2] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[3] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[4] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[5] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[6] Andrew W. Moore,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.
[7] Nahum Shimkin,et al. Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning , 2016, ICML.
[8] Shane Legg,et al. Noisy Networks for Exploration , 2017, ICLR.
[9] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[10] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[11] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.
[12] Yang Liu,et al. Learning to Play in a Day: Faster Deep Reinforcement Learning by Optimality Tightening , 2016, ICLR.
[13] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.
[14] Haitao Wang,et al. Deep reinforcement learning with experience replay based on SARSA , 2016, 2016 IEEE Symposium Series on Computational Intelligence (SSCI).
[15] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[16] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[17] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[18] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[19] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[20] Lihong Li,et al. Toward Minimax Off-policy Value Estimation , 2015, AISTATS.
[21] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[22] Ioannis Mitliagkas,et al. Asynchrony begets momentum, with an application to deep learning , 2016, 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[23] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[24] Tom Schaul,et al. FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.
[25] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.