Fast and Data-Efficient Training of Rainbow: an Experimental Study on Atari

Across the Arcade Learning Environment, Rainbow achieves a level of performance competitive with humans and modern RL algorithms. However, attaining this level of performance requires large amounts of data and hardware resources, making research in this area computationally expensive and use in practical applications often infeasible. This paper’s contribution is threefold: We (1) propose an improved version of Rainbow, seeking to drastically reduce Rainbow’s data, training time, and compute requirements while maintaining its competitive performance; (2) we empirically demonstrate the effectiveness of our approach through experiments on the Arcade Learning Environment, and (3) we conduct a number of ablation studies to investigate the effect of the individual proposed modifications. Our improved version of Rainbow reaches a median human normalized score close to classic Rainbow’s, while using 20 times less data and requiring only 7.5 hours of training time on a single GPU. We also provide our full implementation including pre-trained models.

[1]  Demis Hassabis,et al.  Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.

[2]  Marc G. Bellemare,et al.  A Distributional Perspective on Reinforcement Learning , 2017, ICML.

[3]  R Devon Hjelm,et al.  Data-Efficient Reinforcement Learning with Self-Predictive Representations , 2020 .

[4]  Gabriel Dulac-Arnold,et al.  Challenges of Real-World Reinforcement Learning , 2019, ArXiv.

[5]  Marcin Andrychowicz,et al.  Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.

[6]  Michal Valko,et al.  Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.

[7]  Kilian Q. Weinberger,et al.  Towards Deeper Deep Reinforcement Learning , 2021, ArXiv.

[8]  知秀 柴田 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .

[9]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[10]  John Schulman,et al.  Gotta Learn Fast: A New Benchmark for Generalization in RL , 2018, ArXiv.

[11]  Hao Wu,et al.  Mixed Precision Training , 2017, ICLR.

[12]  Mohammad Norouzi,et al.  Mastering Atari with Discrete World Models , 2020, ICLR.

[13]  Pieter Abbeel,et al.  Reinforcement Learning with Augmented Data , 2020, NeurIPS.

[14]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[15]  Quoc V. Le,et al.  Efficient Neural Architecture Search via Parameter Sharing , 2018, ICML.

[16]  Jakub W. Pachocki,et al.  Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[17]  Wojciech M. Czarnecki,et al.  Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[18]  Jakub W. Pachocki,et al.  Dota 2 with Large Scale Deep Reinforcement Learning , 2019, ArXiv.

[19]  Daniel Guo,et al.  Agent57: Outperforming the Atari Human Benchmark , 2020, ICML.

[20]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[21]  Mohammad Norouzi,et al.  Dream to Control: Learning Behaviors by Latent Imagination , 2019, ICLR.

[22]  Tom Schaul,et al.  Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[23]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[24]  Xiaohua Zhai,et al.  A Large-Scale Study on Regularization and Normalization in GANs , 2018, ICML.

[25]  Rémi Munos,et al.  Recurrent Experience Replay in Distributed Reinforcement Learning , 2018, ICLR.

[26]  Wenlong Fu,et al.  Model-based reinforcement learning: A survey , 2018 .

[27]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[28]  Matteo Hessel,et al.  When to use parametric models in reinforcement learning? , 2019, NeurIPS.

[29]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[30]  Christopher De Sa,et al.  Low-Precision Reinforcement Learning: Running Soft Actor-Critic in Half Precision , 2021, ICML.

[31]  Razvan Pascanu,et al.  Spectral Normalisation for Deep Reinforcement Learning: an Optimisation Perspective , 2021, ICML.

[32]  Craig Boutilier,et al.  Data center cooling using model-predictive control , 2018, NeurIPS.

[33]  David Budden,et al.  Distributed Prioritized Experience Replay , 2018, ICLR.

[34]  Jürgen Schmidhuber,et al.  Recurrent World Models Facilitate Policy Evolution , 2018, NeurIPS.

[35]  Marc G. Bellemare,et al.  Distributional Reinforcement Learning with Quantile Regression , 2017, AAAI.

[36]  Daniel Guo,et al.  Never Give Up: Learning Directed Exploration Strategies , 2020, ICLR.

[37]  Marc G. Bellemare,et al.  Dopamine: A Research Framework for Deep Reinforcement Learning , 2018, ArXiv.

[38]  J. Schulman,et al.  Leveraging Procedural Generation to Benchmark Reinforcement Learning , 2019, ICML.

[39]  Marlos C. Machado,et al.  On Bonus Based Exploration Methods In The Arcade Learning Environment , 2020, ICLR.

[40]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[41]  Pieter Abbeel,et al.  Accelerated Methods for Deep Reinforcement Learning , 2018, ArXiv.

[42]  Shane Legg,et al.  Massively Parallel Methods for Deep Reinforcement Learning , 2015, ArXiv.

[43]  Pablo Samuel Castro,et al.  Revisiting Rainbow: Promoting more insightful and inclusive deep reinforcement learning research , 2021, ICML.

[44]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[45]  Animesh Garg,et al.  D2RL: Deep Dense Architectures in Reinforcement Learning , 2020, ArXiv.

[46]  Pieter Abbeel,et al.  CURL: Contrastive Unsupervised Representations for Reinforcement Learning , 2020, ICML.

[47]  Sergey Levine,et al.  Model-Based Reinforcement Learning for Atari , 2019, ICLR.

[48]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[49]  Ilya Kostrikov,et al.  Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels , 2020, ArXiv.

[50]  Geoffrey E. Hinton,et al.  Big Self-Supervised Models are Strong Semi-Supervised Learners , 2020, NeurIPS.

[51]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[52]  Amos J. Storkey,et al.  Exploration by Random Network Distillation , 2018, ICLR.

[53]  Shane Legg,et al.  Noisy Networks for Exploration , 2017, ICLR.

[54]  Quoc V. Le,et al.  Chip Placement with Deep Reinforcement Learning , 2020, ArXiv.

[55]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[56]  Alexei Baevski,et al.  wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations , 2020, NeurIPS.

[57]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[58]  Demis Hassabis,et al.  A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[59]  Marc G. Bellemare,et al.  The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning , 2017, ICLR.

[60]  Nir Levine,et al.  An empirical investigation of the challenges of real-world reinforcement learning , 2020, ArXiv.

[61]  Taghi M. Khoshgoftaar,et al.  A survey on Image Data Augmentation for Deep Learning , 2019, Journal of Big Data.

[62]  Maximilian Lam,et al.  Quantized Reinforcement Learning (QUARL) , 2019, ArXiv.

[63]  Marlos C. Machado,et al.  Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , 2017, J. Artif. Intell. Res..

[64]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[65]  Luis Perez,et al.  The Effectiveness of Data Augmentation in Image Classification using Deep Learning , 2017, ArXiv.

[66]  Tom Schaul,et al.  Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.