Emergent Solutions to High-Dimensional Multitask Reinforcement Learning

Algorithms that learn through environmental interaction and delayed rewards, or reinforcement learning (RL), increasingly face the challenge of scaling to dynamic, high-dimensional, and partially observable environments. Significant attention is being paid to frameworks from deep learning, which scale to high-dimensional data by decomposing the task through multilayered neural networks. While effective, the representation is complex and computationally demanding. In this work, we propose a framework based on genetic programming which adaptively complexifies policies through interaction with the task. We make a direct comparison with several deep reinforcement learning frameworks in the challenging Atari video game environment as well as more traditional reinforcement learning frameworks based on a priori engineered features. Results indicate that the proposed approach matches the quality of deep learning while being a minimum of three orders of magnitude simpler with respect to model complexity. This results in real-time operation of the champion RL agent without recourse to specialized hardware support. Moreover, the approach is capable of evolving solutions to multiple game titles simultaneously with no additional computational cost. In this case, agent behaviours for an individual game as well as single agents capable of playing all games emerge from the same evolutionary run.

[1]  Malcolm I. Heywood,et al.  Symbiosis, complexification and simplicity under GP , 2010, GECCO '10.

[2]  Elliot Meyerson,et al.  Reuse of Neural Modules for General Video Game Playing , 2015, AAAI.

[3]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[4]  Mark H. M. Winands,et al.  Enhancements for Monte-Carlo Tree Search in Ms Pac-Man , 2012, 2012 IEEE Conference on Computational Intelligence and Games (CIG).

[5]  Malcolm I. Heywood,et al.  Coevolutionary bid-based genetic programming for problem decomposition in classification , 2008, Genetic Programming and Evolvable Machines.

[6]  Malcolm I. Heywood,et al.  On Diversity, Teaming, and Hierarchical Policies: Observations from the Keepaway Soccer Task , 2014, EuroGP.

[7]  Malcolm I. Heywood,et al.  Emergent Tangled Graph Representations for Atari Game Playing Agents , 2017, EuroGP.

[8]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[9]  Marc G. Bellemare,et al.  Investigating Contingency Awareness Using Atari 2600 Games , 2012, AAAI.

[10]  Terence Soule,et al.  Behavioral Diversity and a Probabilistically Optimal GP Ensemble , 2004, Genetic Programming and Evolvable Machines.

[11]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[12]  Terence Soule,et al.  Novel ways of improving cooperation and performance in ensemble classifiers , 2007, GECCO '07.

[13]  Ruslan Salakhutdinov,et al.  Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.

[14]  Malcolm I. Heywood,et al.  On run time libraries and hierarchical symbiosis , 2012, 2012 IEEE Congress on Evolutionary Computation.

[15]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[16]  Kenneth O. Stanley,et al.  Abandoning Objectives: Evolution Through the Search for Novelty Alone , 2011, Evolutionary Computation.

[17]  Malcolm I. Heywood,et al.  Managing team-based problem solving with symbiotic bid-based genetic programming , 2008, GECCO '08.

[18]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[19]  Mohak Shah,et al.  Evaluating Learning Algorithms: Bibliography , 2011 .

[20]  Istvan Szita,et al.  Reinforcement Learning in Games , 2012, Reinforcement Learning.

[21]  Marlos C. Machado,et al.  State of the Art Control of Atari Games Using Shallow Reinforcement Learning , 2015, AAMAS.

[22]  W. Banzhaf,et al.  1 Linear Genetic Programming , 2007 .

[23]  Malcolm I. Heywood,et al.  Scaling Tangled Program Graphs to Visual Reinforcement Learning in ViZDoom , 2018, EuroGP.

[24]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[25]  Malcolm I. Heywood,et al.  Hierarchical task decomposition through symbiosis in reinforcement learning , 2012, GECCO '12.

[26]  Malcolm I. Heywood,et al.  Genotypic versus Behavioural Diversity for Teams of Programs under the 4-v-3 Keepaway Soccer Task , 2014, AAAI.

[27]  Mohak Shah,et al.  Evaluating Learning Algorithms: A Classification Perspective , 2011 .

[28]  Malcolm I. Heywood,et al.  The Rubik cube and GP Temporal Sequence learning: An initial study , 2011 .

[29]  Risto Miikkulainen,et al.  Discovering Multimodal Behavior in Ms. Pac-Man Through Evolution of Modular Neural Networks , 2016, IEEE Transactions on Computational Intelligence and AI in Games.

[30]  Malcolm I. Heywood,et al.  Emergent Policy Discovery for Visual Reinforcement Learning Through Tangled Program Graphs: A Tutorial , 2018, GPTP.

[31]  Malcolm I. Heywood,et al.  Multi-task learning in Atari video games with emergent tangled program graphs , 2017, GECCO.

[32]  Wolfgang Banzhaf,et al.  Rethinking multilevel selection in genetic programming , 2011, GECCO '11.

[33]  Shane Legg,et al.  Massively Parallel Methods for Deep Reinforcement Learning , 2015, ArXiv.

[34]  Stefano Nolfi,et al.  Using Emergent Modularity to Develop Control Systems for Mobile Robots , 1997, Adapt. Behav..

[35]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[36]  Risto Miikkulainen,et al.  A Neuroevolution Approach to General Atari Game Playing , 2014, IEEE Transactions on Computational Intelligence and AI in Games.

[37]  Wolfgang Banzhaf,et al.  Evolving Teams of Predictors with Linear Genetic Programming , 2001, Genetic Programming and Evolvable Machines.

[38]  Peter Stone,et al.  The Impact of Determinism on Learning Atari 2600 Games , 2015, AAAI Workshop: Learning for General Competency in Video Games.