Emergent Tangled Graph Representations for Atari Game Playing Agents

Organizing code into coherent programs and relating different programs to each other represents an underlying requirement for scaling genetic programming to more difficult task domains. Assuming a model in which policies are defined by teams of programs, in which team and program are represented using independent populations and coevolved, has previously been shown to support the development of variable sized teams. In this work, we generalize the approach to provide a complete framework for organizing multiple teams into arbitrarily deep/wide structures through a process of continuous evolution; hereafter the Tangled Program Graph (TPG). Benchmarking is conducted using a subset of 20 games from the Arcade Learning Environment (ALE), an Atari 2600 video game emulator. The games considered here correspond to those in which deep learning was unable to reach a threshold of play consistent with that of a human. Information provided to the learning agent is limited to that which a human would experience. That is, screen capture sensory input, Atari joystick actions, and game score. The performance of the proposed approach exceeds that of deep learning in 15 of the 20 games, with 7 of the 15 also exceeding that associated with a human level of competence. Moreover, in contrast to solutions from deep learning, solutions discovered by TPG are also very ‘sparse’. Rather than assuming that all of the state space contributes to every decision, each action in TPG is resolved following execution of a subset of an individual’s graph. This results in significantly lower computational requirements for model building than presently the case for deep learning.

[1]  Malcolm I. Heywood,et al.  On Diversity, Teaming, and Hierarchical Policies: Observations from the Keepaway Soccer Task , 2014, EuroGP.

[2]  W. Banzhaf,et al.  1 Linear Genetic Programming , 2007 .

[3]  Wolfgang Banzhaf,et al.  Rethinking multilevel selection in genetic programming , 2011, GECCO '11.

[4]  Malcolm I. Heywood,et al.  Managing team-based problem solving with symbiotic bid-based genetic programming , 2008, GECCO '08.

[5]  Stefano Nolfi,et al.  Using Emergent Modularity to Develop Control Systems for Mobile Robots , 1997, Adapt. Behav..

[6]  Malcolm I. Heywood,et al.  Hierarchical task decomposition through symbiosis in reinforcement learning , 2012, GECCO '12.

[7]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[8]  Malcolm I. Heywood,et al.  Symbiosis, complexification and simplicity under GP , 2010, GECCO '10.

[9]  Malcolm I. Heywood,et al.  Genotypic versus Behavioural Diversity for Teams of Programs under the 4-v-3 Keepaway Soccer Task , 2014, AAAI.

[10]  Malcolm I. Heywood,et al.  The Rubik cube and GP Temporal Sequence learning: An initial study , 2011 .

[11]  Merav Parter,et al.  Facilitated Variation: How Evolution Learns from Past Environments To Generalize to New Environments , 2008, PLoS Comput. Biol..

[12]  Terence Soule,et al.  Novel ways of improving cooperation and performance in ensemble classifiers , 2007, GECCO '07.

[13]  Justinian P. Rosea Towards Automatic Discovery of Building Blocks in Genetic Programming , 1995 .

[14]  Risto Miikkulainen,et al.  Discovering Multimodal Behavior in Ms. Pac-Man Through Evolution of Modular Neural Networks , 2016, IEEE Transactions on Computational Intelligence and AI in Games.

[15]  Uri Alon,et al.  Varying environments can speed up evolution , 2007, Proceedings of the National Academy of Sciences.

[16]  Terence Soule,et al.  Behavioral Diversity and a Probabilistically Optimal GP Ensemble , 2004, Genetic Programming and Evolvable Machines.

[17]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[18]  Mark H. M. Winands,et al.  Enhancements for Monte-Carlo Tree Search in Ms Pac-Man , 2012, 2012 IEEE Conference on Computational Intelligence and Games (CIG).

[19]  Jürgen Schmidhuber,et al.  A Wavelet-based Encoding for Neuroevolution , 2016, GECCO.

[20]  Lee Spector,et al.  Tag-based modules in genetic programming , 2011, GECCO '11.

[21]  Malcolm I. Heywood,et al.  On run time libraries and hierarchical symbiosis , 2012, 2012 IEEE Congress on Evolutionary Computation.

[22]  Risto Miikkulainen,et al.  A Neuroevolution Approach to General Atari Game Playing , 2014, IEEE Transactions on Computational Intelligence and AI in Games.

[23]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[24]  Wolfgang Banzhaf,et al.  Evolving Teams of Predictors with Linear Genetic Programming , 2001, Genetic Programming and Evolvable Machines.