Leveraging Procedural Generation to Benchmark Reinforcement Learning

We introduce Procgen Benchmark, a suite of 16 procedurally generated game-like environments designed to benchmark both sample efficiency and generalization in reinforcement learning. We believe that the community will benefit from increased access to high quality training environments, and we provide detailed experimental protocols for using this benchmark. We empirically demonstrate that diverse environment distributions are essential to adequately train and evaluate RL agents, thereby motivating the extensive use of procedural content generation. We then use this benchmark to investigate the effects of scaling model size, finding that larger models significantly improve both sample efficiency and generalization.

[1]  Tom Schaul,et al.  Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[2]  S. Levine,et al.  Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning , 2019, CoRL.

[3]  Jinwoo Shin,et al.  A Simple Randomization Technique for Generalization in Deep Reinforcement Learning , 2019, ICLR 2020.

[4]  Tor Lattimore,et al.  Behaviour Suite for Reinforcement Learning , 2019, ICLR.

[5]  Marlos C. Machado,et al.  Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , 2017, J. Artif. Intell. Res..

[6]  Samy Bengio,et al.  A Study on Overfitting in Deep Reinforcement Learning , 2018, ArXiv.

[7]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[8]  Julian Togelius,et al.  Cellular automata for real-time generation of infinite cave levels , 2010, PCGames@FDG.

[9]  Joelle Pineau,et al.  Natural Environment Benchmarks for Reinforcement Learning , 2018, ArXiv.

[10]  Marlos C. Machado,et al.  Generalization and Regularization in DQN , 2018, ArXiv.

[11]  Julian Togelius,et al.  Illuminating Generalization in Deep Reinforcement Learning through Procedural Level Generation , 2018, 1806.10729.

[12]  Julian Togelius,et al.  General Video Game AI: A Multitrack Framework for Evaluating Agents, Games, and Content Generation Algorithms , 2018, IEEE Transactions on Games.

[13]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[14]  Joelle Pineau,et al.  A Dissection of Overfitting and Generalization in Continuous Reinforcement Learning , 2018, ArXiv.

[15]  J. Kruskal On the shortest spanning subtree of a graph and the traveling salesman problem , 1956 .

[16]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[17]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[18]  John Schulman,et al.  Gotta Learn Fast: A New Benchmark for Generalization in RL , 2018, ArXiv.

[19]  Dawn Xiaodong Song,et al.  Assessing Generalization in Deep Reinforcement Learning , 2018, ArXiv.

[20]  Jinwoo Shin,et al.  Network Randomization: A Simple Technique for Generalization in Deep Reinforcement Learning , 2019, ICLR.

[21]  Murray Shanahan,et al.  The Animal-AI Environment: Training and Testing Animal-Like Artificial Cognition , 2019, ArXiv.

[22]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[23]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[24]  Julian Togelius,et al.  Obstacle Tower: A Generalization Challenge in Vision, Control, and Planning , 2019, IJCAI.

[25]  Taehoon Kim,et al.  Quantifying Generalization in Reinforcement Learning , 2018, ICML.

[26]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.