The role of executive function in shaping reinforcement learning

Reinforcement learning (RL) models have advanced our understanding of how animals learn and make decisions, and how the brain supports some aspects of learning. However, the neural computations that are explained by RL algorithms fall short of explaining many sophisticated aspects of human decision making, including the generalization of learned information, one-shot learning, and the synthesis of task information in complex environments.. Instead, these aspects of instrumental behavior are assumed to be supported by the brain's executive functions (EF). We review recent findings that highlight the importance of EF in learning. Specifically, we advance the theory that EF sets the stage for canonical RL computations in the brain, providing inputs that broaden their flexibility and applicability. Our theory has important implications for how to interpret RL computations in the brain and behavior.

[1]  Brain Networks for Cognitive Control: Four Unresolved Questions , 2021, Intrusive Thinking.

[2]  Doina Precup,et al.  What can I do here? A Theory of Affordances in Reinforcement Learning , 2020, ICML.

[3]  Geoffrey Schoenbaum,et al.  Dopamine transients do not act as model-free prediction errors during associative learning , 2020, Nature Communications.

[4]  Zeb Kurth-Nelson,et al.  A distributional code for value in dopamine-based reinforcement learning , 2020, Nature.

[5]  Y. Niv,et al.  Intact Reinforcement Learning But Impaired Attentional Control During Multidimensional Probabilistic Learning in Older Adults , 2019, The Journal of Neuroscience.

[6]  Anne Collins,et al.  Computational evidence for hierarchically structured reinforcement learning in humans , 2019, Proceedings of the National Academy of Sciences.

[7]  Anne G E Collins,et al.  Modeling the influence of working memory, reinforcement, and action uncertainty on reaction time and choice during instrumental learning , 2019, Psychonomic bulletin & review.

[8]  Maria K. Eckstein,et al.  Distentangling the systems contributing to changes in learning during adolescence , 2019, Developmental Cognitive Neuroscience.

[9]  W. Ma,et al.  Humans incorporate trial-to-trial working memory uncertainty into rewarded decisions , 2018, Proceedings of the National Academy of Sciences.

[10]  N. Daw,et al.  Reduced model-based decision-making in gambling disorder , 2019, Scientific Reports.

[11]  Samuel J. Gershman,et al.  The role of state uncertainty in the dynamics of dopamine , 2019, Current Biology.

[12]  Y. Niv Learning task-state representations , 2019, Nature Neuroscience.

[13]  Suzanne N. Haber,et al.  A neural network for information seeking , 2019, Nature Communications.

[14]  Michael Moutoussis,et al.  Credit assignment to state-independent task representations and its relationship with model-based decision making , 2019, Proceedings of the National Academy of Sciences.

[15]  Jane X. Wang,et al.  Reinforcement Learning, Fast and Slow , 2019, Trends in Cognitive Sciences.

[16]  Yael Niv,et al.  State representation in mental illness , 2019, Current Opinion in Neurobiology.

[17]  Ian C. Ballard,et al.  Holistic Reinforcement Learning: The Role of Structure and Attention , 2019, Trends in Cognitive Sciences.

[18]  C. Quaedflieg,et al.  Stress-induced impairment in goal-directed instrumental behaviour is moderated by baseline working memory , 2019, Neurobiology of Learning and Memory.

[19]  Danesh Shahnazian,et al.  Subgoal- and Goal-related Reward Prediction Errors in Medial Prefrontal Cortex , 2019, Journal of Cognitive Neuroscience.

[20]  Darius E. Parvin,et al.  Neural Signatures of Prediction Errors in a Decision-Making Task Are Modulated by Action Execution Failures , 2018, Current Biology.

[21]  Danielle J. Navarro,et al.  Do Additional Features Help or Hurt Category Learning? The Curse of Dimensionality in Human Learners , 2018, Cogn. Sci..

[22]  Ernest Mas-Herrero,et al.  The contribution of striatal pseudo-reward prediction errors to value-based decision-making , 2017, NeuroImage.

[23]  Theresa M. Desrochers,et al.  Hierarchical cognitive control and the frontal lobes. , 2019, Handbook of clinical neurology.

[24]  Noah D. Goodman,et al.  Beyond Reward Prediction Errors: Human Striatum Updates Rule Values During Learning , 2017, bioRxiv.

[25]  Earl K. Miller,et al.  Working Memory 2.0 , 2018, Neuron.

[26]  D. Hernaus,et al.  Motivational deficits in schizophrenia relate to abnormalities in cortical learning rate signals , 2018, Cognitive, Affective, & Behavioral Neuroscience.

[27]  Earl K Miller,et al.  Working Memory: Delay Activity, Yes! Persistent Activity? Maybe Not , 2018, The Journal of Neuroscience.

[28]  Tali Sharot,et al.  Valuation of knowledge and ignorance in mesolimbic reward circuitry , 2018, Proceedings of the National Academy of Sciences.

[29]  S. Gershman,et al.  Belief state representation in the dopamine system , 2018, Nature Communications.

[30]  Thomas L. Griffiths,et al.  Rational metareasoning and the plasticity of cognitive control , 2018, PLoS Comput. Biol..

[31]  Y. Niv,et al.  Model-based predictions for dopamine , 2018, Current Opinion in Neurobiology.

[32]  Tom Beckers,et al.  Working Memory and Reinforcement Schedule Jointly Determine Reinforcement Learning in Children: Potential Implications for Behavioral Parent Training , 2018, Front. Psychol..

[33]  Michael J Frank,et al.  Within- and across-trial dynamics of human EEG reveal cooperative interplay between reinforcement learning and working memory , 2017, Proceedings of the National Academy of Sciences.

[34]  Yi Zeng,et al.  A Brain-Inspired Decision Making Model Based on Top-Down Biasing of Prefrontal Cortex to Basal Ganglia and Its Application in Autonomous UAV Explorations , 2018, Cognitive Computation.

[35]  C. Gremel,et al.  Chronic alcohol exposure disrupts top-down control over basal ganglia action selection to produce habits , 2018, Nature Communications.

[36]  Anne G. E. Collins,et al.  The tortoise and the hare: interactions between reinforcement learning and working memory , 2017, bioRxiv.

[37]  Daeyeol Lee,et al.  Feature-based learning improves adaptability without compromising precision , 2017, Nature Communications.

[38]  Michael J. Frank,et al.  Compositional clustering in task structure learning , 2017, bioRxiv.

[39]  Zhewei Zhang,et al.  A neural network model for the orbitofrontal cortex and task space acquisition during reinforcement learning , 2017, bioRxiv.

[40]  David Badre,et al.  Working Memory Load Strengthens Reward Prediction Errors , 2017, The Journal of Neuroscience.

[41]  Samuel Gershman,et al.  Predictive representations can link model-based reinforcement learning to model-free mechanisms , 2017, bioRxiv.

[42]  S. Gershman,et al.  Dopamine reward prediction errors reflect hidden state inference across time , 2017, Nature Neuroscience.

[43]  Yuan Chang Leong,et al.  Dynamic Interaction between Reinforcement Learning and Attention in Multidimensional Environments , 2017, Neuron.

[44]  Julie C. Helmers,et al.  Chunking as a rational strategy for lossy data compression in visual working memory , 2017, bioRxiv.

[45]  Matthew J. Crossley,et al.  Credit assignment in movement-dependent reinforcement learning , 2016, Proceedings of the National Academy of Sciences.

[46]  T. Robbins,et al.  Drug Addiction: Updating Actions to Habits to Compulsions Ten Years On. , 2016, Annual review of psychology.

[47]  Anne G E Collins,et al.  Working Memory Contributions to Reinforcement Learning Impairments in Schizophrenia , 2014, The Journal of Neuroscience.

[48]  Robert C. Wilson,et al.  Orbitofrontal Cortex as a Cognitive Map of Task Space , 2014, Neuron.

[49]  Carolyn E. Jones,et al.  Gradual extinction prevents the return of fear: implications for the discovery of state , 2013, Front. Behav. Neurosci..

[50]  Carlos Diuk,et al.  Hierarchical Learning Induces Two Simultaneous, But Separable, Prediction Errors in Human Basal Ganglia , 2013, The Journal of Neuroscience.

[51]  Anne G E Collins,et al.  Cognitive control over learning: creating, clustering, and generalizing task-set structure. , 2013, Psychological review.

[52]  M. Frank,et al.  Mechanisms of hierarchical reinforcement learning in cortico-striatal circuits 2: evidence from fMRI. , 2012, Cerebral cortex.

[53]  M. Frank,et al.  Mechanisms of hierarchical reinforcement learning in corticostriatal circuits 1: computational analysis. , 2012, Cerebral cortex.

[54]  A. Baddeley Working memory: theories, models, and controversies. , 2012, Annual review of psychology.

[55]  Jeremy M Wolfe,et al.  Visual Attention , 2020, Computational Models for Cognitive Vision.

[56]  M. Frank Computational models of motivated action selection in corticostriatal circuits , 2011, Current Opinion in Neurobiology.

[57]  P. Dayan,et al.  Model-based influences on humans’ choices and striatal prediction errors , 2011, Neuron.

[58]  M. Botvinick,et al.  Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective , 2009, Cognition.

[59]  M. D’Esposito,et al.  Is the rostro-caudal axis of the frontal lobe hierarchical? , 2009, Nature Reviews Neuroscience.

[60]  三嶋 博之 The theory of affordances , 2008 .

[61]  Thomas E. Hazy,et al.  Towards an executive without a homunculus: computational models of the prefrontal cortex/basal ganglia system , 2007, Philosophical Transactions of the Royal Society B: Biological Sciences.

[62]  C. Summerfield,et al.  An information theoretical approach to prefrontal executive function , 2007, Trends in Cognitive Sciences.

[63]  K. Doya,et al.  Multiple Representations of Belief States and Action Values in Corticobasal Ganglia Loops , 2007, Annals of the New York Academy of Sciences.

[64]  Mehdi Khamassi,et al.  Actor–Critic Models of Reinforcement Learning in the Basal Ganglia: From Natural to Artificial Rats , 2005, Adapt. Behav..

[65]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[66]  Peter Dayan,et al.  Temporal difference models describe higher-order learning in humans , 2004, Nature.

[67]  Karl J. Friston,et al.  Temporal Difference Models and Reward-Related Learning in the Human Brain , 2003, Neuron.

[68]  Eytan Ruppin,et al.  Actor-critic models of the basal ganglia: new anatomical and computational perspectives , 2002, Neural Networks.

[69]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[70]  R. Passingham Attention to action. , 1996, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[71]  Donald A. Norman,et al.  Attention to Action , 1986 .