The role of executive function in shaping reinforcement learning

Reinforcement learning (RL) models have advanced our understanding of how animals learn and make decisions, and how the brain supports learning. However, the neural computations that are explained by RL algorithms fall short of explaining many sophisticated aspects of human learning and decision making, including the generalization of behavior to novel contexts, one-shot learning, and the synthesis of task information in complex environments. Instead, these aspects of behavior are assumed to be supported by the brain’s executive functions (EF). We review recent findings that highlight the importance of EF in instrumental learning. Specifically, we advance the theory that EF sets the stage for canonical RL computations in the brain, providing inputs that broaden their flexibility and applicability. Our theory has important implications for how to interpret RL computations in both brain and behavior.

[1]  Doina Precup,et al.  What can I do here? A Theory of Affordances in Reinforcement Learning , 2020, ICML.

[2]  Robert C. Wilson,et al.  Orbitofrontal Cortex as a Cognitive Map of Task Space , 2014, Neuron.

[3]  Carlos Diuk,et al.  Hierarchical Learning Induces Two Simultaneous, But Separable, Prediction Errors in Human Basal Ganglia , 2013, The Journal of Neuroscience.

[4]  Yael Niv,et al.  State representation in mental illness , 2019, Current Opinion in Neurobiology.

[5]  Sarah R. Heilbronner,et al.  A neural network for information seeking , 2019, Nature Communications.

[6]  Danesh Shahnazian,et al.  Subgoal- and Goal-related Reward Prediction Errors in Medial Prefrontal Cortex , 2019, Journal of Cognitive Neuroscience.

[7]  Julie C. Helmers,et al.  Chunking as a rational strategy for lossy data compression in visual working memory , 2017, bioRxiv.

[8]  Ernest Mas-Herrero,et al.  The contribution of striatal pseudo-reward prediction errors to value-based decision-making , 2017, NeuroImage.

[9]  Zeb Kurth-Nelson,et al.  A distributional code for value in dopamine-based reinforcement learning , 2020, Nature.

[10]  Peter Dayan,et al.  Temporal difference models describe higher-order learning in humans , 2004, Nature.

[11]  Anne Collins,et al.  Computational evidence for hierarchically structured reinforcement learning in humans , 2019, Proceedings of the National Academy of Sciences.

[12]  Eytan Ruppin,et al.  Actor-critic models of the basal ganglia: new anatomical and computational perspectives , 2002, Neural Networks.

[13]  Y. Niv,et al.  Intact Reinforcement Learning But Impaired Attentional Control During Multidimensional Probabilistic Learning in Older Adults , 2019, The Journal of Neuroscience.

[14]  Danielle J. Navarro,et al.  Do Additional Features Help or Hurt Category Learning? The Curse of Dimensionality in Human Learners , 2018, Cogn. Sci..

[15]  Samuel J. Gershman,et al.  The role of state uncertainty in the dynamics of dopamine , 2019, Current Biology.

[16]  M. Botvinick,et al.  Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective , 2009, Cognition.

[17]  Ian C. Ballard,et al.  Holistic Reinforcement Learning: The Role of Structure and Attention , 2019, Trends in Cognitive Sciences.

[18]  Michael J. Frank,et al.  Compositional clustering in task structure learning , 2017, bioRxiv.

[19]  David Badre,et al.  Working Memory Load Strengthens Reward Prediction Errors , 2017, The Journal of Neuroscience.

[20]  Maria K. Eckstein,et al.  Distentangling the systems contributing to changes in learning during adolescence , 2020, Developmental Cognitive Neuroscience.

[21]  Anne G E Collins,et al.  Cognitive control over learning: creating, clustering, and generalizing task-set structure. , 2013, Psychological review.

[22]  Carolyn E. Jones,et al.  Gradual extinction prevents the return of fear: implications for the discovery of state , 2013, Front. Behav. Neurosci..

[23]  A. Baddeley Working memory: theories, models, and controversies. , 2012, Annual review of psychology.

[24]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[25]  Michael Moutoussis,et al.  Credit assignment to state-independent task representations and its relationship with model-based decision making , 2019, Proceedings of the National Academy of Sciences.

[26]  Jane X. Wang,et al.  Reinforcement Learning, Fast and Slow , 2019, Trends in Cognitive Sciences.

[27]  Richard B. Ivry,et al.  Neural Signatures of Prediction Errors in a Decision-Making Task Are Modulated by Action Execution Failures , 2018 .

[28]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[29]  K. Doya,et al.  Multiple Representations of Belief States and Action Values in Corticobasal Ganglia Loops , 2007, Annals of the New York Academy of Sciences.

[30]  Y. Niv,et al.  Model-based predictions for dopamine , 2018, Current Opinion in Neurobiology.

[31]  M. Frank Computational models of motivated action selection in corticostriatal circuits , 2011, Current Opinion in Neurobiology.

[32]  Matthew J. Crossley,et al.  Credit assignment in movement-dependent reinforcement learning , 2016, Proceedings of the National Academy of Sciences.

[33]  Darius E. Parvin,et al.  Neural Signatures of Prediction Errors in a Decision-Making Task Are Modulated by Action Execution Failures , 2018, Current Biology.

[34]  Donald A. Norman,et al.  Attention to Action , 1986 .

[35]  C. Gremel,et al.  Chronic alcohol exposure disrupts top-down control over basal ganglia action selection to produce habits , 2018, Nature Communications.

[36]  Wei Ji Ma,et al.  Humans incorporate trial-to-trial working memory uncertainty into rewarded decisions , 2020, Proceedings of the National Academy of Sciences.

[37]  Earl K Miller,et al.  Working Memory: Delay Activity, Yes! Persistent Activity? Maybe Not , 2018, The Journal of Neuroscience.

[38]  Samuel Gershman,et al.  Predictive representations can link model-based reinforcement learning to model-free mechanisms , 2017, bioRxiv.

[39]  Michael J Frank,et al.  Within- and across-trial dynamics of human EEG reveal cooperative interplay between reinforcement learning and working memory , 2017, Proceedings of the National Academy of Sciences.

[40]  Jeremy M Wolfe,et al.  Visual Attention , 2020, Computational Models for Cognitive Vision.

[41]  Y. Niv Learning task-state representations , 2019, Nature Neuroscience.

[42]  P. Dayan,et al.  Model-based influences on humans’ choices and striatal prediction errors , 2011, Neuron.

[43]  Yi Zeng,et al.  A Brain-Inspired Decision Making Model Based on Top-Down Biasing of Prefrontal Cortex to Basal Ganglia and Its Application in Autonomous UAV Explorations , 2018, Cognitive Computation.

[44]  Anne G E Collins,et al.  Modeling the influence of working memory, reinforcement, and action uncertainty on reaction time and choice during instrumental learning , 2019, Psychonomic bulletin & review.

[45]  S. Gershman,et al.  Dopamine reward prediction errors reflect hidden state inference across time , 2017, Nature Neuroscience.

[46]  Tali Sharot,et al.  Valuation of knowledge and ignorance in mesolimbic reward circuitry , 2018, Proceedings of the National Academy of Sciences.

[47]  Anne G E Collins,et al.  Working Memory Contributions to Reinforcement Learning Impairments in Schizophrenia , 2014, The Journal of Neuroscience.

[48]  Noah D. Goodman,et al.  Beyond Reward Prediction Errors: Human Striatum Updates Rule Values During Learning , 2017, bioRxiv.

[49]  三嶋 博之 The theory of affordances , 2008 .

[50]  M. Frank,et al.  Mechanisms of hierarchical reinforcement learning in cortico-striatal circuits 2: evidence from fMRI. , 2012, Cerebral cortex.

[51]  C. Quaedflieg,et al.  Stress-induced impairment in goal-directed instrumental behaviour is moderated by baseline working memory , 2019, Neurobiology of Learning and Memory.

[52]  Geoffrey Schoenbaum,et al.  Dopamine transients do not act as model-free prediction errors during associative learning , 2020, Nature Communications.

[53]  Thomas E. Hazy,et al.  Towards an executive without a homunculus: computational models of the prefrontal cortex/basal ganglia system , 2007, Philosophical Transactions of the Royal Society B: Biological Sciences.

[54]  S. Gershman,et al.  Belief state representation in the dopamine system , 2018, Nature Communications.

[55]  R. Passingham Attention to action. , 1996, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[56]  Suzanne N. Haber,et al.  A neural network for information seeking , 2019, Nature Communications.

[57]  Karl J. Friston,et al.  Temporal Difference Models and Reward-Related Learning in the Human Brain , 2003, Neuron.

[58]  Daeyeol Lee,et al.  Feature-based learning improves adaptability without compromising precision , 2017, Nature Communications.

[59]  Theresa M. Desrochers,et al.  Hierarchical cognitive control and the frontal lobes. , 2019, Handbook of clinical neurology.

[60]  Mehdi Khamassi,et al.  Actor–Critic Models of Reinforcement Learning in the Basal Ganglia: From Natural to Artificial Rats , 2005, Adapt. Behav..

[61]  T. Robbins,et al.  Drug Addiction: Updating Actions to Habits to Compulsions Ten Years On. , 2016, Annual review of psychology.

[62]  Earl K. Miller,et al.  Working Memory 2.0 , 2018, Neuron.

[63]  Yuan Chang Leong,et al.  Dynamic Interaction between Reinforcement Learning and Attention in Multidimensional Environments , 2017, Neuron.

[64]  C. Summerfield,et al.  An information theoretical approach to prefrontal executive function , 2007, Trends in Cognitive Sciences.

[65]  D. Hernaus,et al.  Motivational deficits in schizophrenia relate to abnormalities in cortical learning rate signals , 2018, Cognitive, Affective, & Behavioral Neuroscience.

[66]  Tom Beckers,et al.  Working Memory and Reinforcement Schedule Jointly Determine Reinforcement Learning in Children: Potential Implications for Behavioral Parent Training , 2018, Front. Psychol..

[67]  M. Frank,et al.  Mechanisms of hierarchical reinforcement learning in corticostriatal circuits 1: computational analysis. , 2012, Cerebral cortex.

[68]  N. Daw,et al.  Reduced model-based decision-making in gambling disorder , 2019, Scientific Reports.

[69]  Computational evidence for hierarchically structured reinforcement learning in humans , 2020, Proceedings of the National Academy of Sciences.

[70]  Zhewei Zhang,et al.  A neural network model for the orbitofrontal cortex and task space acquisition during reinforcement learning , 2017, bioRxiv.

[71]  M. D’Esposito,et al.  Is the rostro-caudal axis of the frontal lobe hierarchical? , 2009, Nature Reviews Neuroscience.

[72]  Thomas L. Griffiths,et al.  Rational metareasoning and the plasticity of cognitive control , 2018, PLoS Comput. Biol..

[73]  Anne G. E. Collins,et al.  The tortoise and the hare: interactions between reinforcement learning and working memory , 2017, bioRxiv.