Evolving inborn knowledge for fast adaptation in dynamic POMDP problems

Rapid online adaptation to changing tasks is an important problem in machine learning and, recently, a focus of meta-reinforcement learning. However, reinforcement learning (RL) algorithms struggle in POMDP environments because the state of the system, essential in a RL framework, is not always visible. Additionally, hand-designed meta-RL architectures may not include suitable computational structures for specific learning problems. The evolution of online learning mechanisms, on the contrary, has the ability to incorporate learning strategies into an agent that can (i) evolve memory when required and (ii) optimize adaptation speed to specific online learning problems. In this paper, we exploit the highly adaptive nature of neuromodulated neural networks to evolve a controller that uses the latent space of an autoencoder in a POMDP. The analysis of the evolved networks reveals the ability of the proposed algorithm to acquire inborn knowledge in a variety of aspects such as the detection of cues that reveal implicit rewards, and the ability to evolve location neurons that help with navigation. The integration of inborn knowledge and online plasticity enabled fast adaptation and better performance in comparison to some non-evolutionary meta-reinforcement learning algorithms. The algorithm proved also to succeed in the 3D gaming environment Malmo Minecraft.

[1]  Paul J. Werbos,et al.  Applications of advances in nonlinear sensitivity analysis , 1982 .

[2]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[3]  Risto Miikkulainen,et al.  Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[4]  Katja Hofmann,et al.  Fast Context Adaptation via Meta-Learning , 2018, ICML.

[5]  Francesco Mondada,et al.  Evolution of Plastic Neurocontrollers for Situated Agents , 1996 .

[6]  Sepp Hochreiter,et al.  Learning to Learn Using Gradient Descent , 2001, ICANN.

[7]  X. Yao Evolving Artificial Neural Networks , 1999 .

[8]  Zeb Kurth-Nelson,et al.  Learning to reinforcement learn , 2016, CogSci.

[9]  Katja Hofmann,et al.  The Malmo Platform for Artificial Intelligence Experimentation , 2016, IJCAI.

[10]  Pieter Abbeel,et al.  Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[11]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[12]  Jieyu Zhao,et al.  Simple Principles of Metalearning , 1996 .

[13]  Peter L. Bartlett,et al.  RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[14]  Aude Billard,et al.  GasNets and other Evolvable Neural Networks applied to Bipedal Locomotion , 2004 .

[15]  Kenji Doya,et al.  Meta-learning in Reinforcement Learning , 2003, Neural Networks.

[16]  Dario Floreano,et al.  Evolutionary Advantages of Neuromodulated Plasticity in Dynamic, Reward-based Scenarios , 2008, ALIFE.

[17]  Risto Miikkulainen,et al.  Designing neural networks through neuroevolution , 2019, Nat. Mach. Intell..

[18]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[19]  Xi Chen,et al.  Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.

[20]  Pieter Abbeel,et al.  A Simple Neural Attentive Meta-Learner , 2017, ICLR.

[21]  Phil Husbands,et al.  GasNets and other evovalble neural networks applied to bipedal locomotion , 2004 .

[22]  Kenneth O. Stanley,et al.  Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning , 2017, ArXiv.

[23]  Dario Floreano,et al.  Levels of dynamics and adaptive behavior in evolutionary neural controllers , 2002 .

[24]  Sebastian Thrun,et al.  Learning to Learn: Introduction and Overview , 1998, Learning to Learn.

[25]  Sebastian Risi,et al.  DLNE: A hybridization of deep learning and neuroevolution for visual control , 2017, 2017 IEEE Conference on Computational Intelligence and Games (CIG).

[26]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[27]  Jürgen Schmidhuber,et al.  Recurrent World Models Facilitate Policy Evolution , 2018, NeurIPS.

[28]  D. Floreano,et al.  Evolution of Adaptive Behaviour in Robots by Means of Darwinian Selection , 2010, PLoS biology.

[29]  Julian Togelius,et al.  Autoencoder-augmented neuroevolution for visual doom playing , 2017, 2017 IEEE Conference on Computational Intelligence and Games (CIG).

[30]  Sebastian Risi,et al.  Born to Learn: the Inspiration, Progress, and Future of Evolved Plastic Artificial Neural Networks , 2017, Neural Networks.

[31]  Sebastian Risi,et al.  Deep neuroevolution of recurrent and discrete world models , 2019, GECCO.

[32]  Sergey Levine,et al.  Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables , 2019, ICML.

[33]  Tamim Asfour,et al.  ProMP: Proximal Meta-Policy Search , 2018, ICLR.

[34]  Sebastian Risi,et al.  Improving Deep Neuroevolution via Deep Innovation Protection , 2019, ArXiv.

[35]  Yoshua Bengio,et al.  On the Optimization of a Synaptic Learning Rule , 2007 .