Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents

The Arcade Learning Environment (ALE) is an evaluation platform that poses the challenge of building AI agents with general competency across dozens of Atari 2600 games. It supports a variety of different problem settings and it has been receiving increasing attention from the scientific community. In this paper we take a big picture look at how the ALE is being used by the research community. We focus on how diverse the evaluation methodologies in the ALE have become and we highlight some key concerns when evaluating agents in this platform. We use this discussion to present what we consider to be the best practices for future evaluations in the ALE. To further the progress in the field, we also introduce a new version of the ALE that supports multiple game modes and provides a form of stochasticity we call sticky actions.

[1]  R. Bellman Dynamic programming. , 1957, Science.

[2]  Stewart W. Wilson Knowledge Growth in an Artificial Animal , 1985, ICGA.

[3]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[4]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[5]  Sebastian Thrun,et al.  Lifelong robot learning , 1993, Robotics Auton. Syst..

[6]  Leslie Pack Kaelbling,et al.  On reinforcement learning for robots , 1996, IROS.

[7]  Yishay Mansour,et al.  Approximate Planning in Large POMDPs via Reusable Trajectories , 1999, NIPS.

[8]  Murray Campbell,et al.  Deep Blue , 2002, Artif. Intell..

[9]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[10]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[11]  Mark B. Ring CHILD: A First Step Towards Continual Learning , 1997, Machine Learning.

[12]  Michael Kearns,et al.  Near-Optimal Reinforcement Learning in Polynomial Time , 1998, Machine Learning.

[13]  Marcus Hutter Simulation Algorithms for Computational Systems Biology , 2017, Texts in Theoretical Computer Science. An EATCS Series.

[14]  Yishay Mansour,et al.  Reinforcement Learning in POMDPs Without Resets , 2005, IJCAI.

[15]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[16]  Andrew Zisserman,et al.  Advances in Neural Information Processing Systems (NIPS) , 2007 .

[17]  Pierre-Yves Oudeyer,et al.  Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.

[18]  Jonathan Schaeffer,et al.  Checkers Is Solved , 2007, Science.

[19]  Michael L. Littman,et al.  An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..

[20]  Richard S. Sutton,et al.  A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation , 2008, NIPS.

[21]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[22]  Nick Montfort,et al.  Racing the Beam: The Atari Video Computer System , 2009 .

[23]  Yavar Naddaf,et al.  Game-independent AI agents for playing Atari 2600 console games , 2010 .

[24]  R. Sutton,et al.  GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010 .

[25]  Patrick M. Pilarski,et al.  Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.

[26]  Marc G. Bellemare,et al.  Investigating Contingency Awareness Using Atari 2600 Games , 2012, AAAI.

[27]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[28]  Marc G. Bellemare,et al.  Sketch-Based Linear Value Function Approximation , 2012, NIPS.

[29]  Santiago Ontañón,et al.  A Survey of Real-Time Strategy Game AI Research and Competition in StarCraft , 2013, IEEE Transactions on Computational Intelligence and AI in Games.

[30]  Marc G. Bellemare,et al.  Bayesian Learning of Recursively Factored Environments , 2013, ICML.

[31]  Risto Miikkulainen,et al.  General Video Game Playing , 2013, Artificial and Computational Intelligence in Games.

[32]  Andrew G. Barto,et al.  Intrinsic Motivation and Reinforcement Learning , 2013, Intrinsically Motivated Learning in Natural and Artificial Systems.

[33]  Thore Graepel,et al.  A Comparison of learning algorithms on the Arcade Learning Environment , 2014, ArXiv.

[34]  Erik Talvitie,et al.  Model Regularization for Stable Sample Rollouts , 2014, UAI.

[35]  Honglak Lee,et al.  Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning , 2014, NIPS.

[36]  Risto Miikkulainen,et al.  A Neuroevolution Approach to General Atari Game Playing , 2014, IEEE Transactions on Computational Intelligence and AI in Games.

[37]  Marc G. Bellemare,et al.  Skip Context Tree Switching , 2014, ICML.

[38]  Bruno Bouchard,et al.  Reports from the 2015 AAAI Workshop Program , 2015, AI Mag..

[39]  Sergey Levine,et al.  Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models , 2015, ArXiv.

[40]  Marc G. Bellemare,et al.  Compress and Control , 2015, AAAI.

[41]  Peter Stone,et al.  The Impact of Determinism on Learning Atari 2600 Games , 2015, AAAI Workshop: Learning for General Competency in Video Games.

[42]  Marlos C. Machado,et al.  Domain-Independent Optimistic Initialization for Reinforcement Learning , 2014, AAAI Workshop: Learning for General Competency in Video Games.

[43]  Hector Geffner,et al.  Classical Planning with Simulators: Results on the Atari Video Games , 2015, IJCAI.

[44]  Martial Hebert,et al.  Improving Multi-Step Prediction of Learned Time Series Models , 2015, AAAI.

[45]  Shane Legg,et al.  Massively Parallel Methods for Deep Reinforcement Learning , 2015, ArXiv.

[46]  Elliot Meyerson,et al.  Frame Skip Is a Powerful Parameter for Learning to Play Atari , 2015, AAAI Workshop: Learning for General Competency in Video Games.

[47]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[48]  Honglak Lee,et al.  Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[49]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.

[50]  Razvan Pascanu,et al.  Policy Distillation , 2015, ICLR.

[51]  Ruslan Salakhutdinov,et al.  Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.

[52]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[53]  Marlos C. Machado,et al.  State of the Art Control of Atari Games Using Shallow Reinforcement Learning , 2015, AAMAS.

[54]  Benjamin Van Roy,et al.  Deep Exploration via Bootstrapped DQN , 2016, NIPS.

[55]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[56]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[57]  Tom Schaul,et al.  Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[58]  Shie Mannor,et al.  Graying the black box: Understanding DQNs , 2016, ICML.

[59]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[60]  Katja Hofmann,et al.  The Malmo Platform for Artificial Intelligence Experimentation , 2016, IJCAI.

[61]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[62]  Marc G. Bellemare,et al.  Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.

[63]  Marc G. Bellemare,et al.  Increasing the Action Gap: New Operators for Reinforcement Learning , 2015, AAAI.

[64]  Carmel Domshlak,et al.  Blind Search for Atari-Like Online Planning Revisited , 2016, IJCAI.

[65]  Tom Schaul,et al.  FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.

[66]  Marcus Hutter,et al.  Count-Based Exploration in Feature Space for Reinforcement Learning , 2017, IJCAI.

[67]  Marc G. Bellemare,et al.  Count-Based Exploration with Neural Density Models , 2017, ICML.

[68]  Daniel Nikovski,et al.  Value-Aware Loss Function for Model-based Reinforcement Learning , 2017, AISTATS.

[69]  Marlos C. Machado,et al.  A Laplacian Framework for Option Discovery in Reinforcement Learning , 2017, ICML.

[70]  Malcolm I. Heywood,et al.  Emergent Tangled Graph Representations for Atari Game Playing Agents , 2017, EuroGP.

[71]  Tom Schaul,et al.  Learning from Demonstrations for Real World Reinforcement Learning , 2017, ArXiv.

[72]  Daan Wierstra,et al.  Recurrent Environment Simulators , 2017, ICLR.

[73]  Marc G. Bellemare,et al.  A Distributional Perspective on Reinforcement Learning , 2017, ICML.

[74]  Tom Schaul,et al.  Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[75]  Peter Henderson,et al.  Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control , 2017, ArXiv.

[76]  Alex S. Fukunaga,et al.  Learning to Prune Dominated Action Sequences in Online Black-Box Planning , 2017, AAAI.

[77]  Erik Talvitie,et al.  Self-Correcting Models for Model-Based Reinforcement Learning , 2016, AAAI.

[78]  Marlos C. Machado,et al.  Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents (Extended Abstract) , 2018, IJCAI.

[79]  John Schulman,et al.  Gotta Learn Fast: A New Benchmark for Generalization in RL , 2018, ArXiv.

[80]  IEEE Transactions on Computational Intelligence and AI in Games , 2018, IEEE Transactions on Games.

[81]  Tom Schaul,et al.  Deep Q-learning From Demonstrations , 2017, AAAI.