From Crystallized Adaptivity to Fluid Adaptivity in Deep Reinforcement Learning — Insights from Biological Systems on Adaptive Flexibility

Recent developments in machine-learning algorithms have led to impressive performance increases in many traditional application scenarios of artificial intelligence research. In the area of deep reinforcement learning, deep learning functional architectures are combined with incremental learning schemes for sequential tasks that include interaction-based, but often delayed feedback. Despite their impressive successes, modern machine-learning approaches, including deep reinforcement learning, still perform weakly when compared to flexibly adaptive biological systems in certain naturally occurring scenarios. Such scenarios include transfers to environments different than the ones in which the training took place or environments that dynamically change, both of which are often mastered by biological systems through a capability that we here term “fluid adaptivity” to contrast it from the much slower adaptivity (“crystallized adaptivity”) of the prior learning from which the behavior emerged. In this article, we derive and discuss research strategies, based on analyzes of fluid adaptivity in biological systems and its neuronal modeling, that might aid in equipping future artificially intelligent systems with capabilities of fluid adaptivity more similar to those seen in some biologically intelligent systems. A key component of this research strategy is the dynamization of the problem space itself and the implementation of this dynamization by suitably designed flexibly interacting modules.

[1]  R. Cattell,et al.  Abilities : Their Structure , Growth , and Action , 2015 .

[2]  D. E. Goldberg,et al.  Simple Genetic Algorithms and the Minimal, Deceptive Problem , 1987 .

[3]  N. Whitman A bitter lesson. , 1999, Academic medicine : journal of the Association of American Medical Colleges.

[4]  R J Full,et al.  How animals move: an integrative view. , 2000, Science.

[5]  Holk Cruse,et al.  Hexapod Walking: an expansion to Walknet dealing with leg amputations and force oscillations , 2007, Biological Cybernetics.

[6]  Namjung Huh,et al.  Model-based reinforcement learning under concurrent schedules of reinforcement in rodents. , 2009, Learning & memory.

[7]  Kenneth O. Stanley,et al.  Abandoning Objectives: Evolution Through the Search for Novelty Alone , 2011, Evolutionary Computation.

[8]  Thierry Hoinville,et al.  A hexapod walker using a heterarchical architecture for action selection , 2013, Front. Comput. Neurosci..

[9]  Thierry Hoinville,et al.  Walknet, a bio-inspired controller for hexapod walking , 2013, Biological Cybernetics.

[10]  Antoine Cully,et al.  Robots that can adapt like animals , 2014, Nature.

[11]  Jean-Baptiste Mouret,et al.  Illuminating search spaces by mapping elites , 2015, ArXiv.

[12]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[13]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[14]  H. Neumann,et al.  Reversal Learning in Humans and Gerbils: Dynamic Control Network Facilitates Learning , 2016, Front. Neurosci..

[15]  F. Ohl,et al.  Selective Increase of Auditory Cortico-Striatal Coherence during Auditory-Cued Go/NoGo Discrimination Learning , 2016, Front. Behav. Neurosci..

[16]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[17]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[18]  Joshua B. Tenenbaum,et al.  Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[19]  Tom Schaul,et al.  The Predictron: End-To-End Learning and Planning , 2016, ICML.

[20]  D. Hassabis,et al.  Neuroscience-Inspired Artificial Intelligence , 2017, Neuron.

[21]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[22]  M. Schilling,et al.  An Approach to Hierarchical Deep Reinforcement Learning for a Decentralized Walking Control Architecture , 2018, Biologically Inspired Cognitive Architectures 2018.

[23]  Joel Z. Leibo,et al.  Human-level performance in first-person multiplayer games with population-based deep reinforcement learning , 2018, ArXiv.

[24]  Julian Togelius,et al.  AlphaStar: an evolutionary computation perspective , 2019, GECCO.

[25]  Max Jaderberg,et al.  Open-ended Learning in Symmetric Zero-sum Games , 2019, ICML.

[26]  Nicolas Le Roux,et al.  The Value Function Polytope in Reinforcement Learning , 2019, ICML.

[27]  B. Averbeck,et al.  Reinforcement learning in artificial and biological systems , 2019, Nature Machine Intelligence.

[28]  Guy Lever,et al.  Human-level performance in 3D multiplayer games with population-based reinforcement learning , 2018, Science.