Ventral Striatum and Orbitofrontal Cortex Are Both Required for Model-Based, But Not Model-Free, Reinforcement Learning

In many cases, learning is thought to be driven by differences between the value of rewards we expect and rewards we actually receive. Yet learning can also occur when the identity of the reward we receive is not as expected, even if its value remains unchanged. Learning from changes in reward identity implies access to an internal model of the environment, from which information about the identity of the expected reward can be derived. As a result, such learning is not easily accounted for by model-free reinforcement learning theories such as temporal difference reinforcement learning (TDRL), which predicate learning on changes in reward value, but not identity. Here, we used unblocking procedures to assess learning driven by value- versus identity-based prediction errors. Rats were trained to associate distinct visual cues with different food quantities and identities. These cues were subsequently presented in compound with novel auditory cues and the reward quantity or identity was selectively changed. Unblocking was assessed by presenting the auditory cues alone in a probe test. Consistent with neural implementations of TDRL models, we found that the ventral striatum was necessary for learning in response to changes in reward value. However, this area, along with orbitofrontal cortex, was also required for learning driven by changes in reward identity. This observation requires that existing models of TDRL in the ventral striatum be modified to include information about the specific features of expected outcomes derived from model-based representations, and that the role of orbitofrontal cortex in these models be clearly delineated.

[1]  L. Kamin Predictability, surprise, attention, and conditioning , 1967 .

[2]  Douglas L. Jones,et al.  From motivation to action: Functional interface between the limbic system and the motor system , 1980, Progress in Neurobiology.

[3]  L. Swanson The Rat Brain in Stereotaxic Coordinates, George Paxinos, Charles Watson (Eds.). Academic Press, San Diego, CA (1982), vii + 153, $35.00, ISBN: 0 125 47620 5 , 1984 .

[4]  P. Holland Unblocking in Pavlovian appetitive conditioning. , 1984, Journal of experimental psychology. Animal behavior processes.

[5]  Richard S. Sutton,et al.  Time-Derivative Models of Pavlovian Reinforcement , 1990 .

[6]  M. Gabriel,et al.  Learning and Computational Neuroscience: Foundations of Adaptive Networks , 1990 .

[7]  A. Barto,et al.  Adaptive Critics and the Basal Ganglia , 1994 .

[8]  P. Dayan,et al.  A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[9]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[10]  J. Hollerman,et al.  Dopamine neurons report an error in the temporal prediction of reward during learning , 1998, Nature Neuroscience.

[11]  G. Schoenbaum,et al.  Orbitofrontal Cortex and Representation of Incentive Value in Associative Learning , 1999, The Journal of Neuroscience.

[12]  T. Robbins,et al.  Dissociation in Effects of Lesions of the Nucleus Accumbens Core and Shell on Appetitive Pavlovian Approach Behavior and the Potentiation of Conditioned Reinforcement and Locomotor Activity byd-Amphetamine , 1999, The Journal of Neuroscience.

[13]  W. Schultz,et al.  Relative reward preference in primate orbitofrontal cortex , 1999, Nature.

[14]  R. Rescorla Learning about qualitatively different outcomes during a blocking procedure , 1999 .

[15]  P. Kalivas,et al.  The Circuitry Mediating Cocaine-Induced Reinstatement of Drug-Seeking Behavior , 2001, The Journal of Neuroscience.

[16]  B. Balleine,et al.  The Role of the Nucleus Accumbens in Instrumental Conditioning: Evidence of a Functional Dissociation between Accumbens Core and Shell , 2001, The Journal of Neuroscience.

[17]  W. Schultz,et al.  Dopamine responses comply with basic assumptions of formal learning theory , 2001, Nature.

[18]  Eytan Ruppin,et al.  Actor-critic models of the basal ganglia: new anatomical and computational perspectives , 2002, Neural Networks.

[19]  B. Everitt,et al.  Emotion and motivation: the role of the amygdala, ventral striatum, and prefrontal cortex , 2002, Neuroscience & Biobehavioral Reviews.

[20]  Barry Setlow,et al.  Disconnection of the basolateral amygdala complex and nucleus accumbens impairs appetitive pavlovian second-order conditioned responses. , 2002, Behavioral neuroscience.

[21]  E. Miller,et al.  Neuronal activity in primate dorsolateral and orbital prefrontal cortex during performance of a reward preference task , 2003, The European journal of neuroscience.

[22]  Geoffrey Schoenbaum,et al.  Different Roles for Orbitofrontal Cortex and Basolateral Amygdala in a Reinforcer Devaluation Task , 2003, The Journal of Neuroscience.

[23]  Karl J. Friston,et al.  Dissociable Roles of Ventral and Dorsal Striatum in Instrumental Conditioning , 2004, Science.

[24]  E. Murray,et al.  Bilateral Orbital Prefrontal Cortex Lesions in Rhesus Monkeys Disrupt Choices Guided by Both Reward Value and Reward Contingency , 2004, The Journal of Neuroscience.

[25]  T. Robbins,et al.  Putting a spin on the dorsal–ventral divide of the striatum , 2004, Trends in Neurosciences.

[26]  Michela Gallagher,et al.  Lesions of Orbitofrontal Cortex Impair Rats' Differential Outcome Expectancy Learning But Not Conditioned Stimulus-Potentiated Feeding , 2005, The Journal of Neuroscience.

[27]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[28]  C. Padoa-Schioppa,et al.  Neurons in the orbitofrontal cortex encode economic value , 2006, Nature.

[29]  Peter Dayan,et al.  Non-commercial Research and Educational Use including without Limitation Use in Instruction at Your Institution, Sending It to Specific Colleagues That You Know, and Providing a Copy to Your Institution's Administrator. All Other Uses, Reproduction and Distribution, including without Limitation Comm , 2022 .

[30]  R. O’Reilly,et al.  Separate neural substrates for skill learning and performance in the ventral and dorsal striatum , 2007, Nature Neuroscience.

[31]  B. Balleine,et al.  Orbitofrontal Cortex Mediates Outcome Encoding in Pavlovian But Not Instrumental Conditioning , 2007, The Journal of Neuroscience.

[32]  B. McNaughton,et al.  Preferential Reactivation of Motivationally Relevant Information in the Ventral Striatum , 2008, The Journal of Neuroscience.

[33]  Geoffrey Schoenbaum,et al.  The role of the orbitofrontal cortex in the pursuit of happiness and more specific rewards , 2008, Nature.

[34]  Y. Niv,et al.  Dialogues on prediction errors , 2008, Trends in Cognitive Sciences.

[35]  B. McNaughton,et al.  Hippocampus Leads Ventral Striatum in Replay of Place-Reward Information , 2009, PLoS biology.

[36]  Matthijs A. A. van der Meer,et al.  Frontiers in Integrative Neuroscience Integrative Neuroscience Covert Expectation-of-reward in Rat Ventral Striatum at Decision Points , 2022 .

[37]  M. Roesch,et al.  A new perspective on the role of the orbitofrontal cortex in adaptive behaviour , 2009, Nature Reviews Neuroscience.

[38]  Adam Johnson,et al.  Triple Dissociation of Information Processing in Dorsal Striatum, Ventral Striatum, and Hippocampus on a Learned Spatial Decision Task , 2010, Neuron.

[39]  M. Walton,et al.  Dissociable cost and benefit encoding of future rewards by mesolimbic dopamine , 2009, Nature Neuroscience.

[40]  P. Dayan,et al.  States versus Rewards: Dissociable Neural Prediction Error Signals Underlying Model-Based and Model-Free Reinforcement Learning , 2010, Neuron.

[41]  W. Hauber,et al.  The role of nucleus accumbens dopamine in outcome encoding in instrumental and Pavlovian conditioning , 2010, Neurobiology of Learning and Memory.

[42]  Domenic H. Cerri,et al.  Frontiers in Integrative Neuroscience Integrative Neuroscience Materials and Methods Subjects , 2022 .

[43]  Timothy Edward John Behrens,et al.  Separable Learning Systems in the Macaque Brain and the Role of Orbitofrontal Cortex in Contingent Learning , 2010, Neuron.