A Model of External Memory for Navigation in Partially Observable Visual Reinforcement Learning Tasks

Visual reinforcement learning implies that, decision making policies are identified under delayed rewards from an environment. Moreover, state information takes the form of high-dimensional data, such as video. In addition, although the video might characterize a 3D world in high resolution, partial observability will place significant limits on what the agent can actually perceive of the world. This means that the agent also has to: (1) provide efficient encodings of state, (2) store the encodings of state efficiently in some form of memory, (3) recall such memories after arbitrary delays for decision making. In this work, we demonstrate how an external memory model facilitates decision making in the complex world of multi-agent ‘deathmatches’ in the ViZDoom first person shooter environment. The ViZDoom environment provides a complex environment of multiple rooms and resources in which agents are spawned from multiple different locations. A unique approach is adopted to defining external memory for genetic programming agents in which: (1) the state of memory is shared across all programs. (2) Writing is formulated as a probabilistic process, resulting in different regions of memory having short- versus long-term memory. (3) Read operations are indexed, enabling programs to identify regions of external memory with specific temporal properties. We demonstrate that agents purposefully navigate the world when external memory is provided, whereas those without external memory are limited to merely ‘flight or fight’ behaviour.

[1]  Astro Teller,et al.  Turing completeness in the language of genetic programming with indexed memory , 1994, Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence.

[2]  W. B. Langdon,et al.  Genetic Programming and Data Structures , 1998, The Springer International Series in Engineering and Computer Science.

[3]  Hervé Luga,et al.  Evolving simple programs for playing atari games , 2018, GECCO.

[4]  Tom Schaul,et al.  Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[5]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[6]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[7]  Sebastian Risi,et al.  Evolving Neural Turing Machines for Reward-based Learning , 2016, GECCO.

[8]  Peter Nordin,et al.  Evolution of a world model for a miniature robot using genetic programming , 1998, Robotics Auton. Syst..

[9]  Malcolm I. Heywood,et al.  Symbiosis, complexification and simplicity under GP , 2010, GECCO '10.

[10]  C. Koch,et al.  Sparse but not ‘Grandmother-cell’ coding in the medial temporal lobe , 2008, Trends in Cognitive Sciences.

[11]  Scott Brave,et al.  The evolution of memory and mental models using genetic programming , 1996 .

[12]  Malcolm I. Heywood,et al.  Emergent Tangled Graph Representations for Atari Game Playing Agents , 2017, EuroGP.

[13]  Malcolm I. Heywood,et al.  Malicious Automatically Generated Domain Name Detection Using Stateful-SBB , 2013, EvoApplications.

[14]  Wojciech Jaskowski,et al.  ViZDoom: A Doom-based AI research platform for visual reinforcement learning , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).

[15]  L. Huelsbergen,et al.  Toward simulated evolution of machine-language iteration , 1996 .

[16]  David Andre,et al.  Evolution of mapmaking: learning, planning, and memory using genetic programming , 1994, Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence.

[17]  Lee Spector,et al.  Cultural transmission of information in genetic programming , 1996 .

[18]  Astro Teller,et al.  The evolution of mental models , 1994 .

[19]  Sebastian Risi,et al.  HyperENTM: Evolving Scalable Neural Turing Machines through HyperNEAT , 2017, ArXiv.

[20]  Malcolm I. Heywood,et al.  Multi-task learning in Atari video games with emergent tangled program graphs , 2017, GECCO.

[21]  Malcolm I. Heywood,et al.  Emergent Solutions to High-Dimensional Multitask Reinforcement Learning , 2018, Evolutionary Computation.

[22]  Guy Lever,et al.  Human-level performance in 3D multiplayer games with population-based reinforcement learning , 2018, Science.

[23]  Malcolm I. Heywood,et al.  Scaling Tangled Program Graphs to Visual Reinforcement Learning in ViZDoom , 2018, EuroGP.

[24]  Alexandros Agapitos,et al.  Genetic Programming with Memory For Financial Trading , 2016, EvoApplications.