Learning to select actions shapes recurrent dynamics in the corticostriatal system

Learning to select appropriate actions based on their values is fundamental to adaptive behavior. This form of learning is supported by fronto-striatal systems. The dorsal-lateral prefrontal cortex (dlPFC) and the dorsal striatum (dSTR), which are strongly interconnected, are key nodes in this circuitry. Substantial experimental evidence, including neurophysiological recordings, have shown that neurons in these structures represent key aspects of learning. The computational mechanisms that shape the neurophysiological responses, however, are not clear. To examine this, we developed a recurrent neural network (RNN) model of the dlPFC-dSTR circuit and trained it on an oculomotor sequence learning task. We compared the activity generated by the model to activity recorded from monkey dlPFC and dSTR in the same task. This network consisted of a striatal component which encoded action values, and a prefrontal component which selected appropriate actions. After training, this system was able to autonomously represent and update action values and select actions, thus being able to closely approximate the representational structure in corticostriatal recordings. We found that learning to select the correct actions drove action-sequence representations further apart in activity space, both in the model and in the neural data. The model revealed that learning proceeds by increasing the distance between sequence-specific representations. This makes it more likely that the model will select the appropriate action sequence as learning develops. Our model thus supports the hypothesis that learning in networks drives the neural representations of actions further apart, increasing the probability that the network generates correct actions as learning proceeds. Altogether, this study advances our understanding of how neural circuit dynamics are involved in neural computation, showing how dynamics in the corticostriatal system support task learning.

[1]  T. D. Mitchell,et al.  Ecosystem Service Supply and Vulnerability to Global Change in Europe , 2005, Science.

[2]  L. F. Abbott,et al.  Generating Coherent Patterns of Activity from Chaotic Neural Networks , 2009, Neuron.

[3]  K. Doya,et al.  Representation of Action-Specific Reward Values in the Striatum , 2005, Science.

[4]  Ken-ichi Amemori,et al.  Shifting Responsibly: The Importance of Striatal Modularity to Reinforcement Learning in Uncertain Environments , 2011, Front. Hum. Neurosci..

[5]  Bernhard A. Kaplan,et al.  SYSTEMS NEUROSCIENCE ORIGINAL RESEARCH ARTICLE , 2011 .

[6]  Gilles Laurent,et al.  Transient Dynamics for Neural Processing , 2008, Science.

[7]  Geoffrey E. Hinton,et al.  Training Recurrent Neural Networks , 2013 .

[8]  W. Maass,et al.  State-dependent computations: spatiotemporal processing in cortical networks , 2009, Nature Reviews Neuroscience.

[9]  Y. Niv,et al.  Model-based predictions for dopamine , 2018, Current Opinion in Neurobiology.

[10]  Lee E. Miller,et al.  Neural Manifolds for the Control of Movement , 2017, Neuron.

[11]  E. Miller,et al.  Different time courses of learning-related activity in the prefrontal cortex and striatum , 2005, Nature.

[12]  Jane X. Wang,et al.  Reinforcement Learning, Fast and Slow , 2019, Trends in Cognitive Sciences.

[13]  N. Parga,et al.  Dynamic Control of Response Criterion in Premotor Cortex during Perceptual Detection under Temporal Uncertainty , 2015, Neuron.

[14]  W. Schultz,et al.  Learning of sequential movements by neural network model with dopamine-like reinforcement signal , 1998, Experimental Brain Research.

[15]  P A Salin,et al.  Corticocortical connections in the visual system: structure and function. , 1995, Physiological reviews.

[16]  Christos Constantinidis,et al.  Variability of Prefrontal Neuronal Discharges before and after Training in a Working Memory Task , 2012, PloS one.

[17]  Christopher D. Harvey,et al.  Recurrent Network Models of Sequence Generation and Memory , 2016, Neuron.

[18]  Wieland Brendel,et al.  Demixed Principal Component Analysis , 2011, NIPS.

[19]  K. Sakai Task set and prefrontal cortex. , 2008, Annual review of neuroscience.

[20]  Devika Narain,et al.  Flexible timing by temporal scaling of cortical responses , 2017, Nature Neuroscience.

[21]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[22]  Ha Hong,et al.  Performance-optimized hierarchical models predict neural responses in higher visual cortex , 2014, Proceedings of the National Academy of Sciences.

[23]  Wilten Nicola,et al.  Supervised learning in spiking neural networks with FORCE training , 2016, Nature Communications.

[24]  David J. Freedman,et al.  Computing by Robust Transience: How the Fronto-Parietal Network Performs Sequential, Category-Based Decisions , 2017, Neuron.

[25]  Sukbin Lim,et al.  Balanced cortical microcircuitry for maintaining information in working memory , 2013, Nature Neuroscience.

[26]  Devika Narain,et al.  Flexible sensorimotor computations through rapid reconfiguration of cortical dynamics , 2018 .

[27]  Jennifer A. Mangels,et al.  Predictive Codes for Forthcoming Perception in the Frontal Cortex , 2006, Science.

[28]  K. Doya,et al.  Parallel Cortico-Basal Ganglia Mechanisms for Acquisition and Execution of Visuomotor SequencesA Computational Approach , 2001, Journal of Cognitive Neuroscience.

[29]  Paul F. M. J. Verschure,et al.  The why, what, where, when and how of goal-directed choice: neuronal and computational principles , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.

[30]  Laura Masullo,et al.  Cerebral organoids at the air-liquid interface generate diverse nerve tracts with functional output , 2018, Nature Neuroscience.

[31]  E. Miller,et al.  Learning Substrates in the Primate Prefrontal Cortex and Striatum: Sustained Activity Related to Successful Actions , 2009, Neuron.

[32]  M. Sahani,et al.  Cortical control of arm movements: a dynamical systems perspective. , 2013, Annual review of neuroscience.

[33]  David S. Lorberbaum,et al.  Genetic evidence that Nkx2.2 acts primarily downstream of Neurog3 in pancreatic endocrine lineage development , 2017, eLife.

[34]  E. Koechlin,et al.  Executive control and decision-making in the prefrontal cortex , 2015, Current Opinion in Behavioral Sciences.

[35]  M. Bar,et al.  Top-down predictions in the cognitive brain , 2007, Brain and Cognition.

[36]  Ian C. Ballard,et al.  Holistic Reinforcement Learning: The Role of Structure and Attention , 2019, Trends in Cognitive Sciences.

[37]  H. Seo,et al.  Neural basis of reinforcement learning and decision making. , 2012, Annual review of neuroscience.

[38]  David Sussillo,et al.  Harnessing behavioral diversity to understand neural computations for cognition , 2019, Current Opinion in Neurobiology.

[39]  N. Sigala,et al.  Dynamic Coding for Cognitive Control in Prefrontal Cortex , 2013, Neuron.

[40]  L. Abbott,et al.  From fixed points to chaos: Three models of delayed discrimination , 2013, Progress in Neurobiology.

[41]  Surya Ganguli,et al.  A deep learning framework for neuroscience , 2019, Nature Neuroscience.

[42]  Karl J. Friston,et al.  A theory of cortical responses , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[43]  Matthew T. Kaufman,et al.  A neural network that finds a naturalistic solution for the production of muscle activity , 2015, Nature Neuroscience.

[44]  Xiao-Jing Wang,et al.  Reward-based training of recurrent neural networks for cognitive and value-based tasks , 2016, bioRxiv.

[45]  Jerker Denrell,et al.  Indirect Social Influence , 2008, Science.

[46]  Karl J. Friston,et al.  Dissociable Roles of Ventral and Dorsal Striatum in Instrumental Conditioning , 2004, Science.

[47]  W. Gerstner,et al.  Optimal Control of Transient Dynamics in Balanced Networks Supports Generation of Complex Movements , 2014, Neuron.

[48]  Erin L. Rich,et al.  Decoding subjective decisions from orbitofrontal cortex , 2016, Nature Neuroscience.

[49]  David Sussillo,et al.  Opening the Black Box: Low-Dimensional Dynamics in High-Dimensional Recurrent Neural Networks , 2013, Neural Computation.

[50]  D. Hassabis,et al.  Neural Mechanisms of Hierarchical Planning in a Virtual Subway Network , 2016, Neuron.

[51]  C. Summerfield,et al.  An information theoretical approach to prefrontal executive function , 2007, Trends in Cognitive Sciences.

[52]  Joel L. Davis,et al.  A Model of How the Basal Ganglia Generate and Use Neural Signals That Predict Reinforcement , 1994 .

[53]  E. Koechlin,et al.  Reasoning, Learning, and Creativity: Frontal Lobe Function and Human Decision-Making , 2012, PLoS biology.

[54]  N. Daw,et al.  Signals in Human Striatum Are Appropriate for Policy Update Rather than Value Prediction , 2011, The Journal of Neuroscience.

[55]  P. Dayan,et al.  Model-based influences on humans’ choices and striatal prediction errors , 2011, Neuron.

[56]  W. Newsome,et al.  Context-dependent computation by recurrent dynamics in prefrontal cortex , 2013, Nature.

[57]  Mohamed Chtourou,et al.  On the training of recurrent neural networks , 2011, Eighth International Multi-Conference on Systems, Signals & Devices.

[58]  K. C. Anderson,et al.  Single neurons in prefrontal cortex encode abstract rules , 2001, Nature.

[59]  C. Summerfield,et al.  Expectation in perceptual decision making: neural and computational mechanisms , 2014, Nature Reviews Neuroscience.

[60]  Joel Z. Leibo,et al.  Prefrontal cortex as a meta-reinforcement learning system , 2018, bioRxiv.

[61]  Ö. Ekeberg,et al.  The Arbitration–Extension Hypothesis: A Hierarchical Interpretation of the Functional Organization of the Basal Ganglia , 2011, Front. Syst. Neurosci..

[62]  Michael J. Frank,et al.  By Carrot or by Stick: Cognitive Reinforcement Learning in Parkinsonism , 2004, Science.

[63]  Georg B. Keller,et al.  Predictive Processing: A Canonical Cortical Computation , 2018, Neuron.

[64]  B. Averbeck,et al.  Action Selection and Action Value in Frontal-Striatal Circuits , 2012, Neuron.

[65]  J. Grafman,et al.  Human prefrontal cortex: processing and representational perspectives , 2003, Nature Reviews Neuroscience.

[66]  Xin Huang,et al.  A Bacterial Protein Enhances the Release and Efficacy of Liposomal Cancer Drugs , 2006, Science.

[67]  Charles B. Fleming,et al.  Opening the Black Box: Using Process Evaluation Measures to Assess Implementation and Theory Building , 1999, American journal of community psychology.

[68]  Christian Ethier,et al.  Cortical population activity within a preserved neural manifold underlies multiple motor behaviors , 2018, Nature Communications.

[69]  Y. Niv Learning task-state representations , 2019, Nature Neuroscience.

[70]  Michael J. Frank,et al.  Dynamic Dopamine Modulation in the Basal Ganglia: A Neurocomputational Account of Cognitive Deficits in Medicated and Nonmedicated Parkinsonism , 2005, Journal of Cognitive Neuroscience.

[71]  Ari Weinstein,et al.  Model-based hierarchical reinforcement learning and human action control , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.

[72]  Y. Niv Reinforcement learning in the brain , 2009 .

[73]  M. Botvinick Hierarchical reinforcement learning and decision making , 2012, Current Opinion in Neurobiology.

[74]  Jeong‐Wook Ghim,et al.  Learning-Induced Enduring Changes in Functional Connectivity among Prefrontal Cortical Neurons , 2007, The Journal of Neuroscience.

[75]  K. Doya Complementary roles of basal ganglia and cerebellum in learning and motor control , 2000, Current Opinion in Neurobiology.

[76]  Aldo Genovesio,et al.  Transient neuronal correlations underlying goal selection and maintenance in prefrontal cortex. , 2008, Cerebral cortex.

[77]  Naoshige Uchida,et al.  Demixed principal component analysis of neural population data , 2014, eLife.

[78]  Xiao-Jing Wang,et al.  Task representations in neural networks trained to perform many cognitive tasks , 2019, Nature Neuroscience.

[79]  Laurence Aitchison,et al.  With or without you: predictive coding and Bayesian inference in the brain , 2017, Current Opinion in Neurobiology.

[80]  Daeyeol Lee,et al.  Activity in prefrontal cortex during dynamic selection of action sequences , 2006, Nature Neuroscience.

[81]  Vincent D Costa,et al.  Motivational neural circuits underlying reinforcement learning , 2017, Nature Neuroscience.