Rogue-Gym: A New Challenge for Generalization in Reinforcement Learning

In this paper, we propose Rogue-Gym, a simple and classic style roguelike game built for evaluating generalization in reinforcement learning (RL). Combined with the recent progress of deep neural networks, RL has successfully trained humanlevel agents without human knowledge in many games such as those for Atari 2600. However, it has been pointed out that agents trained with RL methods often overfit the training environment, and they work poorly in slightly different environments. To investigate this problem, some research environments with procedural content generation have been proposed. Following these studies, we propose the use of roguelikes as a benchmark for evaluating the generalization ability of RL agents. In our Rogue-Gym, agents need to explore dungeons that are structured differently each time they start a new game. Thanks to the very diverse structures of the dungeons, we believe that the generalization benchmark of Rogue-Gym is sufficiently fair. In our experiments, we evaluate a standard reinforcement learning method, PPO, with and without enhancements for generalization. The results show that some enhancements believed to be effective fail to mitigate the overfitting in Rogue-Gym, although others slightly improve the generalization ability.

[1]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[2]  Marlos C. Machado,et al.  Generalization and Regularization in DQN , 2018, ArXiv.

[3]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[4]  Yoshua Bengio,et al.  Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach , 2011, ICML.

[5]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[6]  Elman Mansimov,et al.  Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation , 2017, NIPS.

[7]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[8]  Shane Legg,et al.  DeepMind Lab , 2016, ArXiv.

[9]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[10]  Christopher Burgess,et al.  DARLA: Improving Zero-Shot Transfer in Reinforcement Learning , 2017, ICML.

[11]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[12]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[13]  David Filliat,et al.  State Representation Learning for Control: An Overview , 2018, Neural Networks.

[14]  David Janz,et al.  Learning to Drive in a Day , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[15]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[16]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[17]  Julian Togelius,et al.  Obstacle Tower: A Generalization Challenge in Vision, Control, and Planning , 2019, IJCAI.

[18]  Taehoon Kim,et al.  Quantifying Generalization in Reinforcement Learning , 2018, ICML.

[19]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[20]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[21]  Samy Bengio,et al.  A Study on Overfitting in Deep Reinforcement Learning , 2018, ArXiv.

[22]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[23]  Marlos C. Machado,et al.  Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , 2017, J. Artif. Intell. Res..