论文信息 - Recurrent Experience Replay in Distributed Reinforcement Learning - 字舞流文

Recurrent Experience Replay in Distributed Reinforcement Learning

Building on the recent successes of distributed training of RL agents, in this paper we investigate the training of RNN-based RL agents from distributed prioritized experience replay. We study the effects of parameter lag resulting in representational drift and recurrent state staleness and empirically derive an improved training strategy. Using a single network architecture and fixed set of hyperparameters, the resulting agent, Recurrent Replay Distributed DQN, quadruples the previous state of the art on Atari-57, and matches the state of the art on DMLab-30. It is the first agent to exceed human-level performance in 52 of the 57 Atari games.

Rémi Munos | Georg Ostrovski | Will Dabney | John Quan | Steven Kapturowski | Georg Ostrovski | R. Munos | Will Dabney | John Quan | Steven Kapturowski

[1] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[2] Rémi Munos,et al. Implicit Quantile Networks for Distributional Reinforcement Learning , 2018, ICML.

[3] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.

[4] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[5] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[6] Marc G. Bellemare,et al. The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning , 2017, ICLR.

[7] P J Webros. BACKPROPAGATION THROUGH TIME: WHAT IT DOES AND HOW TO DO IT , 1990 .

[8] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.

[9] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[10] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[11] David Budden,et al. Distributed Prioritized Experience Replay , 2018, ICLR.

[12] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[13] Wojciech Czarnecki,et al. Multi-task Deep Reinforcement Learning with PopArt , 2018, AAAI.

[14] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[15] Peter Stone,et al. Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[16] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[17] Romain Laroche,et al. Hybrid Reward Architecture for Reinforcement Learning , 2017, NIPS.

[18] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[19] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[20] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[21] Rémi Munos,et al. Observe and Look Further: Achieving Consistent Performance on Atari , 2018, ArXiv.

[22] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .

[23] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[24] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.