Emergent Policy Discovery for Visual Reinforcement Learning Through Tangled Program Graphs: A Tutorial

Tangled Program Graphs (TPG) represents a framework by which multiple programs can be organized to cooperate and decompose a task with minimal a priori information. TPG agents begin with least complexity and incrementally coevolve to discover a complexity befitting the nature of the task. Previous research has demonstrated the TPG framework under visual reinforcement learning tasks from the Arcade Learning Environment and VizDoom first person shooter game that are competitive with those from Deep Learning. However, unlike Deep Learning the emergent constructive properties of TPG results in solutions that are orders of magnitude simpler, thus execution never needs hardware support. In this work, our goal is to provide a tutorial overview demonstrating how the emergent properties of TPG have been achieved as well as providing specific examples of decompositions discovered under the VizDoom task.

[1]  Andrew R. McIntyre,et al.  Evolving GP Classifiers for Streaming Data Tasks with Concept Change and Label Budgets: A Benchmarking Study , 2015, Handbook of Genetic Programming Applications.

[2]  Malcolm I. Heywood,et al.  Emergent Tangled Graph Representations for Atari Game Playing Agents , 2017, EuroGP.

[3]  Julian Francis Miller,et al.  NeuroEvolution: Evolving Heterogeneous Artificial Neural Networks , 2014, Evolutionary Intelligence.

[4]  Marc Ebner,et al.  Evolving Game State Features from Raw Pixels , 2017, EuroGP.

[5]  Elliot Meyerson,et al.  Evolving Deep Neural Networks , 2017, Artificial Intelligence in the Age of Neural Networks and Brain Computing.

[6]  Terence Soule,et al.  Novel ways of improving cooperation and performance in ensemble classifiers , 2007, GECCO '07.

[7]  Malcolm I. Heywood,et al.  Coevolving deep hierarchies of programs to solve complex tasks , 2017, GECCO.

[8]  Malcolm I. Heywood,et al.  On Diversity, Teaming, and Hierarchical Policies: Observations from the Keepaway Soccer Task , 2014, EuroGP.

[9]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[10]  Malcolm I. Heywood,et al.  On botnet detection with genetic programming under streaming data label budgets and class imbalance , 2017, Swarm Evol. Comput..

[11]  Lee Spector,et al.  Expressive genetic programming: concepts and applications , 2018, GECCO.

[12]  Susan Stepney,et al.  Evolving Graphs by Graph Programming , 2018, EuroGP.

[13]  Risto Miikkulainen,et al.  A Neuroevolution Approach to General Atari Game Playing , 2014, IEEE Transactions on Computational Intelligence and AI in Games.

[14]  Wojciech Jaskowski,et al.  ViZDoom: A Doom-based AI research platform for visual reinforcement learning , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).

[15]  Malcolm I. Heywood,et al.  Discovering Agent Behaviors Through Code Reuse: Examples From Half-Field Offense and Ms. Pac-Man , 2018, IEEE Transactions on Games.

[16]  Malcolm I. Heywood,et al.  On run time libraries and hierarchical symbiosis , 2012, 2012 IEEE Congress on Evolutionary Computation.

[17]  Hervé Luga,et al.  Evolving simple programs for playing atari games , 2018, GECCO.

[18]  Risto Miikkulainen,et al.  Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[19]  Malcolm I. Heywood,et al.  Multi-task learning in Atari video games with emergent tangled program graphs , 2017, GECCO.

[20]  Malcolm I. Heywood,et al.  Managing team-based problem solving with symbiotic bid-based genetic programming , 2008, GECCO '08.

[21]  Julian Francis Miller,et al.  Cartesian genetic programming , 2010, GECCO.

[22]  Andrew R. McIntyre,et al.  Symbiotic coevolutionary genetic programming: a benchmarking study under large attribute spaces , 2012, Genetic Programming and Evolvable Machines.

[23]  Malcolm I. Heywood,et al.  Emergent Solutions to High-Dimensional Multitask Reinforcement Learning , 2018, Evolutionary Computation.

[24]  Xi Chen,et al.  Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.

[25]  Wolfgang Banzhaf,et al.  Rethinking multilevel selection in genetic programming , 2011, GECCO '11.

[26]  Kenneth O. Stanley,et al.  Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning , 2017, ArXiv.

[27]  Malcolm I. Heywood,et al.  Scaling Tangled Program Graphs to Visual Reinforcement Learning in ViZDoom , 2018, EuroGP.

[28]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[29]  Marlos C. Machado,et al.  Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , 2017, J. Artif. Intell. Res..

[30]  Shingo Mabu,et al.  A Graph-Based Evolutionary Algorithm: Genetic Network Programming (GNP) and Its Extension Using Reinforcement Learning , 2007, Evolutionary Computation.

[31]  Malcolm I. Heywood,et al.  The Rubik cube and GP Temporal Sequence learning: An initial study , 2011 .

[32]  Wolfgang Banzhaf,et al.  Evolving Teams of Predictors with Linear Genetic Programming , 2001, Genetic Programming and Evolvable Machines.

[33]  Malcolm I. Heywood,et al.  Coevolutionary bid-based genetic programming for problem decomposition in classification , 2008, Genetic Programming and Evolvable Machines.

[34]  W. Banzhaf,et al.  1 Linear Genetic Programming , 2007 .

[35]  Frank Kirchner,et al.  Analysis of an evolutionary reinforcement learning method in a multiagent domain , 2008, AAMAS.

[36]  W. Banzhaf Artificial Regulatory Networks and Genetic Programming , 2003 .

[37]  Malcolm I. Heywood,et al.  Hierarchical task decomposition through symbiosis in reinforcement learning , 2012, GECCO '12.

[38]  Malcolm I. Heywood,et al.  Symbiosis, complexification and simplicity under GP , 2010, GECCO '10.

[39]  Shingo Mabu,et al.  A Novel Graph-Based Estimation of the Distribution Algorithm and its Extension Using Reinforcement Learning , 2014, IEEE Transactions on Evolutionary Computation.

[40]  Astro Teller,et al.  PADO: a new learning architecture for object recognition , 1997 .