A neural network model for the orbitofrontal cortex and task space acquisition during reinforcement learning

Reinforcement learning has been widely used in explaining animal behavior. In reinforcement learning, the agent learns the value of the states in the task, collectively constituting the task state space, and uses the knowledge to choose actions and acquire desired outcomes. It has been proposed that the orbitofrontal cortex (OFC) encodes the task state space during reinforcement learning. However, it is not well understood how the OFC acquires and stores task state information. Here, we propose a neural network model based on reservoir computing. Reservoir networks exhibit heterogeneous and dynamic activity patterns that are suitable to encode task states. The information can be extracted by a linear readout trained with reinforcement learning. We demonstrate how the network acquires and stores task structures. The network exhibits reinforcement learning behavior and its aspects resemble experimental findings of the OFC. Our study provides a theoretical explanation of how the OFC may contribute to reinforcement learning and a new approach to understanding the neural mechanism underlying reinforcement learning.

[1]  E. Rolls,et al.  Reward-related Reversal Learning after Surgical Excisions in Orbito-frontal or Dorsolateral Prefrontal Cortex in Humans , 2004, Journal of Cognitive Neuroscience.

[2]  J. Price,et al.  Sensory and premotor connections of the orbital and medial prefrontal cortex of macaque monkeys , 1995, The Journal of comparative neurology.

[3]  S. Royer,et al.  Conservation of total synaptic weight through balanced synaptic depression and potentiation , 2003, Nature.

[4]  S. Kennerley,et al.  Evaluating choices by single neurons in the frontal lobe: outcome value encoded across multiple decision variables , 2009, The European journal of neuroscience.

[5]  Yuichi Nakamura,et al.  Approximation of dynamical systems by continuous time recurrent neural networks , 1993, Neural Networks.

[6]  P. Dayan,et al.  Model-based influences on humans’ choices and striatal prediction errors , 2011, Neuron.

[7]  C. Padoa-Schioppa,et al.  Neurons in the orbitofrontal cortex encode economic value , 2006, Nature.

[8]  Paul Rodríguez,et al.  Simple Recurrent Networks Learn Context-Free and Context-Sensitive Languages by Counting , 2001, Neural Computation.

[9]  Dean V. Buonomano,et al.  ROBUST TIMING AND MOTOR PATTERNS BY TAMING CHAOS IN RECURRENT NEURAL NETWORKS , 2012, Nature Neuroscience.

[10]  R. Rescorla,et al.  A theory of Pavlovian conditioning : Variations in the effectiveness of reinforcement and nonreinforcement , 1972 .

[11]  R. Dolan,et al.  Dopamine Enhances Model-Based over Model-Free Choice Behavior , 2012, Neuron.

[12]  B. Zhang,et al.  Efficient reinforcement learning of a reservoir network model of parametric working memory achieved with a cluster population winner-take-all readout mechanism. , 2015, Journal of neurophysiology.

[13]  M. Mishkin,et al.  Limbic lesions and the problem of stimulus--reinforcement associations. , 1972, Experimental neurology.

[14]  C. Padoa-Schioppa,et al.  A neuro-computational model of economic decisions. , 2015, Journal of neurophysiology.

[15]  E. Rolls,et al.  Orbitofrontal cortex neurons: role in olfactory and visual association learning. , 1996, Journal of neurophysiology.

[16]  H. Seung,et al.  Learning in Spiking Neural Networks by Reinforcement of Stochastic Synaptic Transmission , 2003, Neuron.

[17]  P. Dayan,et al.  Mapping value based planning and extensively trained choice in the human brain , 2012, Nature Neuroscience.

[18]  W. Maass,et al.  State-dependent computations: spatiotemporal processing in cortical networks , 2009, Nature Reviews Neuroscience.

[19]  Hava T. Siegelmann,et al.  The Dynamic Universality of Sigmoidal Neural Networks , 1996, Inf. Comput..

[20]  András Lörincz,et al.  Reinforcement Learning with Echo State Networks , 2006, ICANN.

[21]  Stefano Fusi,et al.  The Sparseness of Mixed Selectivity Neurons Controls the Generalization–Discrimination Trade-Off , 2013, The Journal of Neuroscience.

[22]  Xiao-Jing Wang,et al.  Internal Representation of Task Rules by Recurrent Dynamics: The Importance of the Diversity of Neural Responses , 2010, Front. Comput. Neurosci..

[23]  Stefano Fusi,et al.  Hebbian Learning in a Random Network Captures Selectivity Properties of the Prefrontal Cortex , 2017, The Journal of Neuroscience.

[24]  S. Thorpe,et al.  The orbitofrontal cortex: Neuronal activity in the behaving monkey , 2004, Experimental Brain Research.

[25]  C. Padoa-Schioppa Neurobiology of economic choice: a good-based model. , 2011, Annual review of neuroscience.

[26]  Bernard W. Balleine,et al.  Actions, Action Sequences and Habits: Evidence That Goal-Directed and Habitual Action Control Are Hierarchically Organized , 2013, PLoS Comput. Biol..

[27]  Robert C. Wilson,et al.  Expectancy-related changes in firing of dopamine neurons depend on orbitofrontal cortex , 2011, Nature Neuroscience.

[28]  Xiao-Jing Wang,et al.  The importance of mixed selectivity in complex cognitive tasks , 2013, Nature.

[29]  G. Schoenbaum,et al.  Reconciling the Roles of Orbitofrontal Cortex in Reversal Learning and the Encoding of Outcome Expectancies , 2007, Annals of the New York Academy of Sciences.

[30]  W. Schultz,et al.  Economic risk coding by single neurons in the orbitofrontal cortex , 2015, Journal of Physiology-Paris.

[31]  Ravi V. Chacko,et al.  Effects of Amygdala Lesions on Reward-Value Coding in Orbital and Medial Prefrontal Cortex , 2013, Neuron.

[32]  Johan A. K. Suykens,et al.  Artificial neural networks for modelling and control of non-linear systems , 1995 .

[33]  L. F. Abbott,et al.  Generating Coherent Patterns of Activity from Chaotic Neural Networks , 2009, Neuron.

[34]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[35]  C. Padoa-Schioppa,et al.  Contributions of Orbitofrontal and Lateral Prefrontal Cortices to Economic Choice and the Good-to-Action Transformation , 2014, Neuron.

[36]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[37]  L. Abbott,et al.  From fixed points to chaos: Three models of delayed discrimination , 2013, Progress in Neurobiology.

[38]  S. Wise,et al.  Comparison of Strategy Signals in the Dorsolateral and Orbital Prefrontal Cortex , 2011, The Journal of Neuroscience.

[39]  Thomas H. B. FitzGerald,et al.  Disruption of Dorsolateral Prefrontal Cortex Decreases Model-Based in Favor of Model-free Control in Humans , 2013, Neuron.

[40]  M. Goldman,et al.  Spatial Patterns of Persistent Neural Activity Vary with the Behavioral Context of Short-Term Memory , 2015, Neuron.

[41]  E. Miller,et al.  Neuronal activity in primate dorsolateral and orbital prefrontal cortex during performance of a reward preference task , 2003, The European journal of neuroscience.

[42]  M. Shapiro,et al.  Reward Stability Determines the Contribution of Orbitofrontal Cortex to Adaptive Behavior , 2012, The Journal of Neuroscience.

[43]  Henry Markram,et al.  Real-Time Computing Without Stable States: A New Framework for Neural Computation Based on Perturbations , 2002, Neural Computation.

[44]  Robert C. Wilson,et al.  Orbitofrontal Cortex as a Cognitive Map of Task Space , 2014, Neuron.

[45]  Y. Niv,et al.  Ventral Striatum and Orbitofrontal Cortex Are Both Required for Model-Based, But Not Model-Free, Reinforcement Learning , 2011, The Journal of Neuroscience.

[46]  C. Padoa-Schioppa Neuronal Origins of Choice Variability in Economic Decisions , 2013, Neuron.

[47]  J. Price,et al.  Limbic connections of the orbital and medial prefrontal cortex in macaque monkeys , 1995, The Journal of comparative neurology.

[48]  Timothy Edward John Behrens,et al.  Separable Learning Systems in the Macaque Brain and the Role of Orbitofrontal Cortex in Contingent Learning , 2010, Neuron.

[49]  Xiao-Jing Wang,et al.  Reward-based training of recurrent neural networks for cognitive and value-based tasks , 2016, bioRxiv.

[50]  E. Murray,et al.  Bilateral Orbital Prefrontal Cortex Lesions in Rhesus Monkeys Disrupt Choices Guided by Both Reward Value and Reward Contingency , 2004, The Journal of Neuroscience.

[51]  Peter Ford Dominey,et al.  Reservoir Computing Properties of Neural Dynamics in Prefrontal Cortex , 2016, PLoS Comput. Biol..

[52]  Tommy C. Blanchard,et al.  Orbitofrontal Cortex Uses Distinct Codes for Different Choice Attributes in Decisions Motivated by Curiosity , 2015, Neuron.

[53]  S. Haber,et al.  Reward-Related Cortical Inputs Define a Large Striatal Region in Primates That Interface with Associative Cortical Connections, Providing a Substrate for Incentive-Based Learning , 2006, The Journal of Neuroscience.

[54]  R. Saunders,et al.  Prefrontal mechanisms of behavioral flexibility, emotion regulation and value updating , 2013, Nature Neuroscience.

[55]  A. Graybiel,et al.  Highly restricted origin of prefrontal cortical inputs to striosomes in the macaque monkey , 1995, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[56]  K. C. Anderson,et al.  Single neurons in prefrontal cortex encode abstract rules , 2001, Nature.

[57]  P. Dayan,et al.  States versus Rewards: Dissociable Neural Prediction Error Signals Underlying Model-Based and Model-Free Reinforcement Learning , 2010, Neuron.

[58]  Timothy E. J. Behrens,et al.  Double dissociation of value computations in orbitofrontal and anterior cingulate neurons , 2011, Nature Neuroscience.

[59]  C. Law,et al.  Reinforcement learning can account for associative and perceptual learning on a visual decision task , 2009, Nature Neuroscience.

[60]  Peter Dayan,et al.  Simple Plans or Sophisticated Habits? State, Transition and Learning Interactions in the Two-Step Task , 2015, bioRxiv.

[61]  Joshua L. Jones,et al.  Orbitofrontal Cortex Supports Behavior and Learning Using Inferred But Not Cached Values , 2012, Science.