A Neuroevolution Approach to General Atari Game Playing

This paper addresses the challenge of learning to play many different video games with little domain-specific knowledge. Specifically, it introduces a neuroevolution approach to general Atari 2600 game playing. Four neuroevolution algorithms were paired with three different state representations and evaluated on a set of 61 Atari games. The neuroevolution agents represent different points along the spectrum of algorithmic sophistication - including weight evolution on topologically fixed neural networks (conventional neuroevolution), covariance matrix adaptation evolution strategy (CMA-ES), neuroevolution of augmenting topologies (NEAT), and indirect network encoding (HyperNEAT). State representations include an object representation of the game screen, the raw pixels of the game screen, and seeded noise (a comparative baseline). Results indicate that direct-encoding methods work best on compact state representations while indirect-encoding methods (i.e., HyperNEAT) allow scaling to higher dimensional representations (i.e., the raw game screen). Previous approaches based on temporal-difference (TD) learning had trouble dealing with the large state spaces and sparse reward gradients often found in Atari games. Neuroevolution ameliorates these problems and evolved policies achieve state-of-the-art results, even surpassing human high scores on three games. These results suggest that neuroevolution is a promising approach to general video game playing (GVGP).

[1]  Jordan B. Pollack,et al.  Automatic design and manufacture of robotic lifeforms , 2000, Nature.

[2]  Peter Stone,et al.  Scaling Reinforcement Learning toward RoboCup Soccer , 2001, ICML.

[3]  Risto Miikkulainen,et al.  Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[4]  Gabriella Kókai,et al.  Evolving a Heuristic Function for the Game of Tetris , 2004, LWA.

[5]  Risto Miikkulainen,et al.  Evolving a Roving Eye for Go , 2004, GECCO.

[6]  Marcus Hutter Simulation Algorithms for Computational Systems Biology , 2017, Texts in Theoretical Computer Science. An EATCS Series.

[7]  Michael R. Genesereth,et al.  General Game Playing: Overview of the AAAI Competition , 2005, AI Mag..

[8]  Kenneth O. Stanley and Bobby D. Bryant and Risto Miikkulainen,et al.  Real-Time Evolution in the NERO Video Game (Winner of CIG 2005 Best Paper Award) , 2005, CIG.

[9]  András Lörincz,et al.  Learning Tetris Using the Noisy Cross-Entropy Method , 2006, Neural Computation.

[10]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[11]  Nikolaus Hansen,et al.  The CMA Evolution Strategy: A Comparing Review , 2006, Towards a New Evolutionary Computation.

[12]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[13]  Kenneth O. Stanley,et al.  Compositional Pattern Producing Networks : A Novel Abstraction of Development , 2007 .

[14]  Simon M. Lucas,et al.  Ms Pac-Man competition , 2007, SEVO.

[15]  Andre Cohen,et al.  An object-oriented representation for efficient reinforcement learning , 2008, ICML '08.

[16]  Kenneth O. Stanley,et al.  A Case Study on the Critical Role of Geometric Regularity in Machine Learning , 2008, AAAI.

[17]  Kenneth O. Stanley,et al.  Generative encoding for multiagent learning , 2008, GECCO '08.

[18]  Dario Floreano,et al.  Neuroevolution: from architectures to learning , 2008, Evol. Intell..

[19]  Bobby D. Bryant,et al.  Visual control in quake II with a cyclic controller , 2008, 2008 IEEE Symposium On Computational Intelligence and Games.

[20]  Julian Togelius,et al.  Super mario evolution , 2009, 2009 IEEE Symposium on Computational Intelligence and Games.

[21]  Charles Ofria,et al.  Evolving coordinated quadruped gaits with the HyperNEAT generative encoding , 2009, 2009 IEEE Congress on Evolutionary Computation.

[22]  Kenneth O. Stanley,et al.  A Hypercube-Based Encoding for Evolving Large-Scale Neural Networks , 2009, Artificial Life.

[23]  Bobby D. Bryant,et al.  Backpropagation without human supervision for visual control in Quake II , 2009, 2009 IEEE Symposium on Computational Intelligence and Games.

[24]  Jeffrey Mark Siskind,et al.  Learning physically-instantiated game play through visual observation , 2010, 2010 IEEE International Conference on Robotics and Automation.

[25]  Yavar Naddaf,et al.  Game-independent AI agents for playing Atari 2600 console games , 2010 .

[26]  Geoffrey I. Webb,et al.  Encyclopedia of Machine Learning , 2011, Encyclopedia of Machine Learning.

[27]  Kenneth O. Stanley,et al.  Evolving Static Representations for Task Transfer , 2010, J. Mach. Learn. Res..

[28]  Samuel Wintermute,et al.  Using Imagery to Simplify Perceptual Abstraction in Reinforcement Learning Agents , 2010, AAAI.

[29]  Josh Bongard,et al.  Morphological change in machines accelerates the evolution of robust behavior , 2011, Proceedings of the National Academy of Sciences.

[30]  Kenneth O. Stanley,et al.  Novelty Search and the Problem with Objectives , 2011 .

[31]  Andrea Lockerd Thomaz,et al.  Automatic State Abstraction from Demonstration , 2011, IJCAI.

[32]  Shane Legg,et al.  An Approximation of the Universal Intelligence Measure , 2011, Algorithmic Probability and Friends.

[33]  Kenneth O. Stanley,et al.  On the Performance of Indirect Encoding Across the Continuum of Regularity , 2011, IEEE Transactions on Evolutionary Computation.

[34]  Julian Togelius,et al.  Measuring Intelligence through Games , 2011, ArXiv.

[35]  Geoffrey E. Hinton,et al.  Generating Text with Recurrent Neural Networks , 2011, ICML.

[36]  Peter Stone,et al.  TEXPLORE: real-time sample-efficient reinforcement learning for robots , 2012, Machine Learning.

[37]  Risto Miikkulainen,et al.  HyperNEAT-GGP: a hyperNEAT-based atari general game player , 2012, GECCO '12.

[38]  Daniel Urieli,et al.  Design and Optimization of an Omnidirectional Humanoid Walk: A Winning Approach at the RoboCup 2011 3D Simulation Competition , 2012, AAAI.

[39]  Lukasz Kaiser,et al.  Learning Games from Videos Guided by Descriptive Complexity , 2012, AAAI.

[40]  Marc G. Bellemare,et al.  Investigating Contingency Awareness Using Atari 2600 Games , 2012, AAAI.

[41]  Hod Lipson,et al.  Automatic Design and Manufacture of Soft Robots , 2012, IEEE Transactions on Robotics.

[42]  Risto Miikkulainen,et al.  Human-Like Combat Behaviour via Multiobjective Neuroevolution , 2012, Believable Bots.

[43]  Marc G. Bellemare,et al.  Sketch-Based Linear Value Function Approximation , 2012, NIPS.

[44]  Risto Miikkulainen,et al.  Constructing controllers for physical multilegged robots using the ENSO neuroevolution approach , 2012, Evol. Intell..

[45]  Dr. Tom Murphy The First Level of Super Mario Bros . is Easy with Lexicographic Orderings and Time Travel , 2013 .

[46]  Tom Schaul,et al.  A video game description language for model-based or interactive learning , 2013, 2013 IEEE Conference on Computational Inteligence in Games (CIG).

[47]  Marc'Aurelio Ranzato,et al.  Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[48]  Risto Miikkulainen,et al.  General Video Game Playing , 2013, Artificial and Computational Intelligence in Games.

[49]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[50]  Christin Wirth,et al.  Blondie24 Playing At The Edge Of Ai , 2016 .