Reinforcement learning: bringing together computation and cognition

A key aspect of human intelligence is our ability to learn very quickly. This ability is still lacking in artificial intelligence. This article will highlight recent research showing how bringing together the fields of artificial intelligence and cognitive science may benefit both. Ideas from artificial intelligence have provided helpful formal theories to account for aspects of human learning. In return, ideas from cognitive science and neuroscience can also inform artificial intelligence research with directions to make algorithms more human-like. For example, recent work shows that human learning can only be understood in the context of multiple separate, interacting memory systems, rather than as a single, complex learner. This insight is starting to show promise in improving artificial agents’ learning efficiency.

[1]  K. Doya,et al.  Representation of Action-Specific Reward Values in the Striatum , 2005, Science.

[2]  James A. Waltz,et al.  Interactions Among Working Memory, Reinforcement Learning, and Effort in Value-Based Choice: A New Paradigm and Selective Deficits in Schizophrenia , 2017, Biological Psychiatry.

[3]  J. Wickens,et al.  A cellular mechanism of reward-related learning , 2001, Nature.

[4]  M. D’Esposito,et al.  Frontal Cortex and the Discovery of Abstract Action Rules , 2010, Neuron.

[5]  Michael J Frank,et al.  Human EEG Uncovers Latent Generalizable Rule Structure during Learning , 2014, The Journal of Neuroscience.

[6]  H. Harlow,et al.  The formation of learning sets. , 1949, Psychological review.

[7]  M. Frank,et al.  Neural signature of hierarchically structured expectations predicts clustering and transfer of rule sets in reinforcement learning , 2016, Cognition.

[8]  David Badre,et al.  Functional Magnetic Resonance Imaging Evidence for a Hierarchical Organization of the Prefrontal Cortex , 2007, Journal of Cognitive Neuroscience.

[9]  M. Gluck,et al.  Interactive memory systems in the human brain , 2001, Nature.

[10]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[11]  C. Summerfield,et al.  A Neural Representation of Prior Information during Perceptual Inference , 2008, Neuron.

[12]  Anne Gabrielle Eva Collins,et al.  The Cost of Structure Learning , 2017, Journal of Cognitive Neuroscience.

[13]  Anne G E Collins,et al.  Opponent actor learning (OpAL): modeling interactive effects of striatal dopamine on reinforcement learning and choice incentive. , 2014, Psychological review.

[14]  Etienne Koechlin,et al.  Foundations of human reasoning in the prefrontal cortex , 2014, Science.

[15]  David Badre,et al.  Learning and transfer of working memory gating policies , 2017, Cognition.

[16]  A G Barto,et al.  Toward a modern theory of adaptive networks: expectation and prediction. , 1981, Psychological review.

[17]  P. Dayan,et al.  Model-based influences on humans’ choices and striatal prediction errors , 2011, Neuron.

[18]  Richard S. Sutton,et al.  Dimensions of Reinforcement Learning , 1998 .

[19]  Charles Kemp,et al.  How to Grow a Mind: Statistics, Structure, and Abstraction , 2011, Science.

[20]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[21]  Michael L. Littman,et al.  State Abstractions for Lifelong Reinforcement Learning , 2018, ICML.

[22]  M. A. MacIver,et al.  Neuroscience Needs Behavior: Correcting a Reductionist Bias , 2017, Neuron.

[23]  Ethan S. Bromberg-Martin,et al.  Midbrain Dopamine Neurons Signal Preference for Advance Information about Upcoming Rewards , 2009, Neuron.

[24]  J. DiCarlo,et al.  Using goal-driven deep learning models to understand sensory cortex , 2016, Nature Neuroscience.

[25]  Denise M Werchan,et al.  Role of Prefrontal Cortex in Learning and Generalizing Hierarchical Rules in 8-Month-Old Infants , 2016, The Journal of Neuroscience.

[26]  Peter L. Bartlett,et al.  RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[27]  Anne G. E. Collins The tortoise and the hare: interactions between reinforcement learning and working memory , 2017 .

[28]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[29]  Mehdi Khamassi,et al.  Adaptive coordination of working-memory and reinforcement learning in non-human primates performing a trial-and-error problem solving task , 2017, Behavioural Brain Research.

[30]  L. Wilbrecht,et al.  Transient stimulation of distinct subpopulations of striatal neurons mimics changes in action value , 2012, Nature Neuroscience.

[31]  Samuel Ritter,et al.  Cognitive Psychology for Deep Neural Networks: A Shape Bias Case Study , 2017, ICML.

[32]  Daan Wierstra,et al.  One-shot Learning with Memory-Augmented Neural Networks , 2016, ArXiv.

[33]  Joel Z. Leibo,et al.  Prefrontal cortex as a meta-reinforcement learning system , 2018, bioRxiv.

[34]  M Botvinick,et al.  Episodic Control as Meta-Reinforcement Learning , 2018, bioRxiv.

[35]  Peter Dayan,et al.  Improving Generalization for Temporal Difference Learning: The Successor Representation , 1993, Neural Computation.

[36]  M. Botvinick,et al.  Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective , 2009, Cognition.

[37]  Anne G E Collins,et al.  How much of reinforcement learning is working memory, not reinforcement learning? A behavioral, computational, and neurogenetic analysis , 2012, The European journal of neuroscience.

[38]  David Badre,et al.  Working Memory Load Strengthens Reward Prediction Errors , 2017, The Journal of Neuroscience.

[39]  Brad E. Pfeiffer,et al.  Reverse Replay of Hippocampal Place Cells Is Uniquely Modulated by Changing Reward , 2016, Neuron.

[40]  Anne G E Collins,et al.  Cognitive control over learning: creating, clustering, and generalizing task-set structure. , 2013, Psychological review.

[41]  David J. Foster Replay Comes of Age. , 2017, Annual review of neuroscience.

[42]  Denise M Werchan,et al.  8-Month-Old Infants Spontaneously Learn and Generalize Hierarchical Rules , 2015, Psychological science.

[43]  G. E. Alexander,et al.  Parallel organization of functionally segregated circuits linking basal ganglia and cortex. , 1986, Annual review of neuroscience.

[44]  P. Dayan,et al.  A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[45]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[46]  Marcin Andrychowicz,et al.  One-Shot Imitation Learning , 2017, NIPS.

[47]  Q. Huys Bayesian Approaches to Learning and Decision-Making , 2018 .

[48]  Mel W. Khaw,et al.  Reminders of past choices bias decisions for reward in humans , 2017, Nature Communications.

[49]  K. Norman,et al.  Reinstated episodic context guides sampling-based decisions for reward , 2017, Nature Neuroscience.

[50]  Anne Gabrielle Eva Collins,et al.  Motor Demands Constrain Cognitive Rule Structures , 2016, PLoS Comput. Biol..

[51]  M. Botvinick,et al.  The successor representation in human reinforcement learning , 2016, bioRxiv.

[52]  David Badre,et al.  Learning and transfer of working memory gating policies , 2017 .

[53]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[54]  Joshua B Tenenbaum,et al.  Toward the neural implementation of structure learning , 2016, Current Opinion in Neurobiology.

[55]  B. Averbeck,et al.  Ventral Striatum Lesions Do Not Affect Reinforcement Learning With Deterministic Outcomes on Slow Time Scales , 2017, Behavioral neuroscience.

[56]  Michael J. Frank,et al.  Compositional clustering in task structure learning , 2017, bioRxiv.

[57]  Michael J. Frank,et al.  A Control Theoretic Model of Adaptive Learning in Dynamic Environments , 2018, Journal of Cognitive Neuroscience.

[58]  A. Markman,et al.  Journal of Experimental Psychology : General Retrospective Revaluation in Sequential Decision Making : A Tale of Two Systems , 2012 .

[59]  E. Koechlin,et al.  Reasoning, Learning, and Creativity: Frontal Lobe Function and Human Decision-Making , 2012, PLoS biology.

[60]  Krzysztof J. Gorgolewski,et al.  Reward Learning over Weeks Versus Minutes Increases the Neural Representation of Value in the Human Brain , 2018, The Journal of Neuroscience.

[61]  Eytan Ruppin,et al.  Actor-critic models of the basal ganglia: new anatomical and computational perspectives , 2002, Neural Networks.

[62]  Michael J. Frank,et al.  Within and across-trial dynamics of human EEG reveal cooperative interplay between reinforcement learning and working memory , 2017, bioRxiv.

[63]  Sergio Gomez Colmenarejo,et al.  Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[64]  Jonathan D. Cohen,et al.  Learning to Use Working Memory in Partially Observable Environments through Dopaminergic Reinforcement , 2008, NIPS.

[65]  D. Hassabis,et al.  Neuroscience-Inspired Artificial Intelligence , 2017, Neuron.