A universal role of the ventral striatum in reward-based learning: Evidence from human studies

Reinforcement learning enables organisms to adjust their behavior in order to maximize rewards. Electrophysiological recordings of dopaminergic midbrain neurons have shown that they code the difference between actual and predicted rewards, i.e., the reward prediction error, in many species. This error signal is conveyed to both the striatum and cortical areas and is thought to play a central role in learning to optimize behavior. However, in human daily life rewards are diverse and often only indirect feedback is available. Here we explore the range of rewards that are processed by the dopaminergic system in human participants, and examine whether it is also involved in learning in the absence of explicit rewards. While results from electrophysiological recordings in humans are sparse, evidence linking dopaminergic activity to the metabolic signal recorded from the midbrain and striatum with functional magnetic resonance imaging (fMRI) is available. Results from fMRI studies suggest that the human ventral striatum (VS) receives valuation information for a diverse set of rewarding stimuli. These range from simple primary reinforcers such as juice rewards over abstract social rewards to internally generated signals on perceived correctness, suggesting that the VS is involved in learning from trial-and-error irrespective of the specific nature of provided rewards. In addition, we summarize evidence that the VS can also be implicated when learning from observing others, and in tasks that go beyond simple stimulus-action-outcome learning, indicating that the reward system is also recruited in more complex learning tasks.

[1]  Norihiro Sadato,et al.  Processing of the Incentive for Social Approval in the Ventral Striatum during Charitable Donation , 2010, Journal of Cognitive Neuroscience.

[2]  R. Dolan,et al.  Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans , 2006, Nature.

[3]  W. Schultz Getting Formal with Dopamine and Reward , 2002, Neuron.

[4]  N. Daw,et al.  Reinforcement Learning Signals in the Human Striatum Distinguish Learners from Nonlearners during Reward-Based Decision Making , 2007, The Journal of Neuroscience.

[5]  W. Schultz,et al.  Neural mechanisms of observational learning , 2010, Proceedings of the National Academy of Sciences.

[6]  M. Delgado,et al.  Reward‐Related Responses in the Human Striatum , 2007, Annals of the New York Academy of Sciences.

[7]  Agnès Gruart,et al.  Observational learning in mice can be prevented by medial prefrontal cortex stimulation and enhanced by nucleus accumbens stimulation. , 2012, Learning & memory.

[8]  R. Buxton,et al.  Modeling the hemodynamic response to brain activation , 2004, NeuroImage.

[9]  I. Fried,et al.  Coupling Between Neuronal Firing, Field Potentials, and fMRI in Human Auditory Cortex , 2005, Science.

[10]  H. Terrace,et al.  Cognitive Imitation in Rhesus Macaques , 2004, Science.

[11]  Matthew T. Kaufman,et al.  Distributed Neural Representation of Expected Value , 2005, The Journal of Neuroscience.

[12]  W. Schultz,et al.  Adaptive Coding of Reward Value by Dopamine Neurons , 2005, Science.

[13]  Vivian V. Valentin,et al.  Overlapping prediction errors in dorsal striatum during instrumental learning with juice and money reward in the human brain. , 2009, Journal of neurophysiology.

[14]  O. Arthurs,et al.  How well do we understand the neural origins of the fMRI BOLD signal? , 2002, Trends in Neurosciences.

[15]  S. Kapur,et al.  Temporal Difference Modeling of the Blood-Oxygen Level Dependent Response During Aversive Conditioning in Humans: Effects of Dopaminergic Modulation , 2007, Biological Psychiatry.

[16]  Helen E. Fisher,et al.  Neural correlates of long-term intense romantic love. , 2012, Social cognitive and affective neuroscience.

[17]  N. Bunzeck,et al.  Absolute Coding of Stimulus Novelty in the Human Substantia Nigra/VTA , 2006, Neuron.

[18]  M. Delgado,et al.  How instructed knowledge modulates the neural systems of reward learning , 2010, Proceedings of the National Academy of Sciences.

[19]  D. Ariely,et al.  Beautiful Faces Have Variable Reward Value fMRI and Behavioral Evidence , 2001, Neuron.

[20]  R. Hari,et al.  Just watching the game ain't enough: striatal fMRI reward responses to successes and failures in a video game during active and vicarious playing , 2013, Front. Hum. Neurosci..

[21]  Karl J. Friston,et al.  A Dual Role for Prediction Error in Associative Learning , 2008, Cerebral cortex.

[22]  S. Thompson Social Learning Theory , 2008 .

[23]  J. Dreher,et al.  Processing of primary and secondary rewards: A quantitative meta-analysis and review of human functional neuroimaging studies , 2013, Neuroscience & Biobehavioral Reviews.

[24]  A. Sclafani 7 Macronutrient-Conditioned Flavor Preferences , 2000 .

[25]  C. Frith,et al.  Mechanisms of social cognition. , 2012, Annual review of psychology.

[26]  Carlos Diuk,et al.  Hierarchical Learning Induces Two Simultaneous, But Separable, Prediction Errors in Human Basal Ganglia , 2013, The Journal of Neuroscience.

[27]  Samuel M. McClure,et al.  BOLD Responses Reflecting Dopaminergic Signals in the Human Ventral Tegmental Area , 2008, Science.

[28]  R. Turner,et al.  Dynamic magnetic resonance imaging of human brain activity during primary sensory stimulation. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[29]  W. A. Myers Observational learning in monkeys. , 1970, Journal of the experimental analysis of behavior.

[30]  John P. O'Doherty,et al.  Human Dorsal Striatum Encodes Prediction Errors during Observational Learning of Instrumental Actions , 2012, Journal of Cognitive Neuroscience.

[31]  P. Glimcher,et al.  Midbrain Dopamine Neurons Encode a Quantitative Reward Prediction Error Signal , 2005, Neuron.

[32]  Robin M. Hogarth,et al.  Information Sampling and Adaptive Cognition: Is Confidence in Decisions Related to Feedback? Evidence from Random Samples of Real-World Behavior , 2005 .

[33]  M. Mintun,et al.  Brain work and brain imaging. , 2006, Annual review of neuroscience.

[34]  M. Delgado,et al.  The social brain and reward: social information processing in the human striatum. , 2014, Wiley interdisciplinary reviews. Cognitive science.

[35]  Driss Boussaoud,et al.  Learning by observation in rhesus monkeys , 2007, Neurobiology of Learning and Memory.

[36]  P. Dayan,et al.  A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[37]  Peter Dayan,et al.  Temporal difference models describe higher-order learning in humans , 2004, Nature.

[38]  P. Dayan,et al.  States versus Rewards: Dissociable Neural Prediction Error Signals Underlying Model-Based and Model-Free Reinforcement Learning , 2010, Neuron.

[39]  P. Garris,et al.  Dissociation of dopamine release in the nucleus accumbens from intracranial self-stimulation , 1999, Nature.

[40]  Scott A. Huettel,et al.  Functional Significance of Striatal Responses during Episodic Decisions: Recovery or Goal Attainment? , 2010, The Journal of Neuroscience.

[41]  R. Cools,et al.  Establishing the dopamine dependency of human striatal signals during reward and punishment reversal learning. , 2014, Cerebral cortex.

[42]  W. Schultz,et al.  Dopamine responses comply with basic assumptions of formal learning theory , 2001, Nature.

[43]  P. König,et al.  A comparison of hemodynamic and neural responses in cat visual cortex using complex stimuli. , 2004, Cerebral cortex.

[44]  J. Frahm,et al.  Dynamic MR imaging of human brain oxygenation during rest and photic stimulation , 1992, Journal of magnetic resonance imaging : JMRI.

[45]  James L Olds,et al.  Positive reinforcement produced by electrical stimulation of septal area and other regions of rat brain. , 1954, Journal of comparative and physiological psychology.

[46]  Jin Fan,et al.  Common and distinct networks underlying reward valence and processing stages: A meta-analysis of functional neuroimaging studies , 2011, Neuroscience & Biobehavioral Reviews.

[47]  W. Schultz Behavioral dopamine signals , 2007, Trends in Neurosciences.

[48]  Robert C. Wilson,et al.  Expectancy-related changes in firing of dopamine neurons depend on orbitofrontal cortex , 2011, Nature Neuroscience.

[49]  C. Sripada,et al.  Reputation for reciprocity engages the brain reward center , 2010, Proceedings of the National Academy of Sciences.

[50]  M. Gluck,et al.  Basal ganglia and dopamine contributions to probabilistic category learning , 2008, Neuroscience & Biobehavioral Reviews.

[51]  O. Hikosaka,et al.  Two types of dopamine neuron distinctly convey positive and negative motivational signals , 2009, Nature.

[52]  F. Hamker,et al.  Biological Models of Reinforcement , 2009 .

[53]  Josiah R. Boivin,et al.  A Causal Link Between Prediction Errors, Dopamine Neurons and Learning , 2013, Nature Neuroscience.

[54]  S. Pollmann,et al.  Comparing the Neural Basis of Monetary Reward and Cognitive Feedback during Information-Integration Category Learning , 2010, The Journal of Neuroscience.

[55]  P. Tobler,et al.  Functional imaging of the human dopaminergic midbrain , 2009, Trends in Neurosciences.

[56]  H. Heinze,et al.  Ageing and early-stage Parkinson's disease affect separable neural mechanisms of mesolimbic reward processing. , 2007, Brain : a journal of neurology.

[57]  J. O'Doherty,et al.  Neural Responses during Anticipation of a Primary Taste Reward , 2002, Neuron.

[58]  Brian Knutson,et al.  Reward-Motivated Learning: Mesolimbic Activation Precedes Memory Formation , 2006, Neuron.

[59]  Caroline F. Zink,et al.  Human striatal activation reflects degree of stimulus saliency , 2006, NeuroImage.

[60]  P. Montague,et al.  Theoretical and Empirical Studies of Learning , 2009 .

[61]  C. Chamley Rational Herds: Economic Models of Social Learning , 2003 .

[62]  Ethan S. Bromberg-Martin,et al.  Lateral habenula neurons signal errors in the prediction of reward information , 2011, Nature Neuroscience.

[63]  Michael X. Cohen,et al.  Individual Differences and the Neural Representations of Reward Expectation and Reward Prediction Error , 2022 .

[64]  C. Heyes,et al.  A Demonstration of Observational Learning in Rats using a Bidirectional Control , 1990, The Quarterly journal of experimental psychology. B, Comparative and physiological psychology.

[65]  W. Schultz,et al.  Responses of monkey dopamine neurons during learning of behavioral reactions. , 1992, Journal of neurophysiology.

[66]  A. Rangel,et al.  Informatic parcellation of the network involved in the computation of subjective value. , 2014, Social cognitive and affective neuroscience.

[67]  I. Daum,et al.  The neural coding of expected and unexpected monetary performance outcomes: Dissociations between active and observational learning , 2012, Behavioural Brain Research.

[68]  J. Rilling,et al.  The neuroscience of social decision-making. , 2011, Annual review of psychology.

[69]  Helen E. Fisher,et al.  Reward, motivation, and emotion systems associated with early-stage intense romantic love. , 2005, Journal of neurophysiology.

[70]  Elizabeth Tricomi,et al.  Feedback signals in the caudate reflect goal achievement on a declarative memory task , 2008, NeuroImage.

[71]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[72]  K. Berridge The debate over dopamine’s role in reward: the case for incentive salience , 2007, Psychopharmacology.

[73]  Ulrich Mayr,et al.  Neural Responses to Taxation and Voluntary Giving Reveal Motives for Charitable Donations , 2007, Science.

[74]  A. Rangel,et al.  Dissociating valuation and saliency signals during decision-making. , 2011, Cerebral cortex.

[75]  Steve W. C. Chang,et al.  Vicarious Reinforcement in Rhesus Macaques (Macaca Mulatta) , 2011, Front. Neurosci..

[76]  W. Schultz,et al.  Coding of Predicted Reward Omission by Dopamine Neurons in a Conditioned Inhibition Paradigm , 2003, The Journal of Neuroscience.

[77]  Hans-Jochen Heinze,et al.  Nucleus Accumbens is Involved in Human Action Monitoring: Evidence from Invasive Electrophysiological Recordings , 2007, Frontiers in human neuroscience.

[78]  Joseph T. McGuire,et al.  A Neural Signature of Hierarchical Reinforcement Learning , 2011, Neuron.

[79]  Stefan Pollmann,et al.  Striatal activations signal prediction errors on confidence in the absence of external feedback , 2012, NeuroImage.

[80]  Luca Passamonti,et al.  A Key Role for Similarity in Vicarious Reward , 2009, Science.

[81]  M. Kawato,et al.  Different neural correlates of reward expectation and reward expectation error in the putamen and caudate nucleus during stimulus-action-reward association learning. , 2006, Journal of neurophysiology.

[82]  J. O'Doherty,et al.  Predictive Neural Coding of Reward Preference Involves Dissociable Responses in Human Ventral Midbrain and Ventral Striatum , 2006, Neuron.

[83]  A portrait of the substrate for self-stimulation. , 1981 .

[84]  R. S. Hinks,et al.  Time course EPI of human brain function during task activation , 1992, Magnetic resonance in medicine.

[85]  F. Gregory Ashby,et al.  THE ROLE OF THE BASAL GANGLIA IN CATEGORY LEARNING , 2006 .

[86]  W. Schultz Midbrain Dopamine Neurons , 2009 .

[87]  B. Rosen,et al.  Functional mapping of the human visual cortex by magnetic resonance imaging. , 1991, Science.

[88]  R. Poldrack Can cognitive processes be inferred from neuroimaging data? , 2006, Trends in Cognitive Sciences.

[89]  D. Tank,et al.  Brain magnetic resonance imaging with contrast dependent on blood oxygenation. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[90]  B. Skinner,et al.  Science and human behavior , 1953 .

[91]  N. Logothetis,et al.  Neurophysiological investigation of the basis of the fMRI signal , 2001, Nature.

[92]  Y. Niv Reinforcement learning in the brain , 2009 .

[93]  D. Shohamy,et al.  Preference by Association: How Memory Mechanisms in the Hippocampus Bias Decisions , 2012, Science.

[94]  R. Dolan,et al.  Psychology: Reward value of attractiveness and gaze , 2001, Nature.

[95]  P. Redgrave,et al.  What is reinforced by phasic dopamine signals? , 2008, Brain Research Reviews.

[96]  Markus Ullsperger,et al.  Real and Fictive Outcomes Are Processed Differently but Converge on a Common Adaptive Mechanism , 2013, Neuron.

[97]  N. Logothetis,et al.  Neurophysiology of the BOLD fMRI Signal in Awake Monkeys , 2008, Current Biology.

[98]  Hans-Jochen Heinze,et al.  Nucleus Accumbens Activity Dissociates Different Forms of Salience: Evidence from Human Intracranial Recordings , 2013, The Journal of Neuroscience.

[99]  J. Dreher,et al.  Cerebral correlates of salient prediction error for different rewards and punishments. , 2013, Cerebral cortex.

[100]  M. Gluck,et al.  Cortico-striatal contributions to feedback-based learning: converging data from neuroimaging and neuropsychology. , 2004, Brain : a journal of neurology.

[101]  H. Roche,et al.  Why Copy Others? Insights from the Social Learning Strategies Tournament , 2010 .

[102]  P. Dayan,et al.  Neural Prediction Errors Reveal a Risk-Sensitive Reinforcement-Learning Process in the Human Brain , 2012, The Journal of Neuroscience.

[103]  T. Lillicrap,et al.  Why Copy Others? Insights from the Social Learning Strategies Tournament , 2010, Science.

[104]  K. Berman,et al.  Cerebral Cortex doi:10.1093/cercor/bhj004 Neural Coding of Distinct Statistical Properties of Reward Information in Humans , 2005 .

[105]  H. Heinze,et al.  Reward-related fMRI activation of dopaminergic midbrain is associated with enhanced hippocampus-dependent long-term memory formation , 2005 .

[106]  J. O'Doherty,et al.  Insights from the application of computational neuroimaging to social neuroscience , 2013, Current Opinion in Neurobiology.

[107]  Henrik Walter,et al.  Prediction error as a linear function of reward probability is coded in human nucleus accumbens , 2006, NeuroImage.

[108]  Nathaniel D. Daw,et al.  Selective impairment of prediction error signaling in human dorsolateral but not ventral striatum in Parkinson's disease patients: evidence from a model-based fMRI study , 2010, NeuroImage.

[109]  H. Heinze,et al.  Mesolimbic Functional Magnetic Resonance Imaging Activations during Reward Anticipation Correlate with Reward-Related Ventral Striatal Dopamine Release , 2008, The Journal of Neuroscience.

[110]  Markus Ullsperger,et al.  When Errors Are Rewarding , 2009, The Journal of Neuroscience.

[111]  G. Schoenbaum,et al.  Model‐based learning and the contribution of the orbitofrontal cortex to the model‐free world , 2012, The European journal of neuroscience.

[112]  Wolfram Schultz Chapter 21 – Midbrain Dopamine Neurons: A Retina of the Reward System? , 2009 .

[113]  Simone Kühn,et al.  The neural correlates of subjective pleasantness , 2012, NeuroImage.

[114]  P. Shizgal,et al.  A portrait of the substrate for self-stimulation. , 1981, Psychological review.

[115]  N. Logothetis Neurovascular Uncoupling: Much Ado about Nothing , 2010, Front. Neuroenerg..

[116]  Nikos K Logothetis,et al.  Interpreting the BOLD signal. , 2004, Annual review of physiology.

[117]  Kevin McCabe,et al.  Neural signature of fictive learning signals in a sequential investment task , 2007, Proceedings of the National Academy of Sciences.

[118]  Timothy Edward John Behrens,et al.  Segregated Encoding of Reward–Identity and Stimulus–Reward Associations in Human Orbitofrontal Cortex , 2013, The Journal of Neuroscience.

[119]  J. O'Doherty,et al.  Model‐Based fMRI and Its Application to Reward Learning and Decision Making , 2007, Annals of the New York Academy of Sciences.

[120]  Michael X. Cohen,et al.  Neurocomputational mechanisms of reinforcement-guided learning in humans: A review , 2008, Cognitive, affective & behavioral neuroscience.

[121]  R. Adolphs,et al.  Social and monetary reward learning engage overlapping neural substrates. , 2012, Social cognitive and affective neuroscience.

[122]  S. Kapur,et al.  Separate brain regions code for salience vs. valence during reward prediction in humans , 2007, Human brain mapping.

[123]  M. Gluck,et al.  Human midbrain sensitivity to cognitive feedback and uncertainty during classification learning. , 2004, Journal of neurophysiology.

[124]  J. Grafman,et al.  Human fronto–mesolimbic networks guide decisions about charitable donation , 2006, Proceedings of the National Academy of Sciences.

[125]  D. Shohamy Learning and motivation in the human striatum , 2011, Current Opinion in Neurobiology.

[126]  Paul J. Whalen,et al.  Are Attractive People Rewarding? Sex Differences in the Neural Substrates of Facial Attractiveness , 2008, Journal of Cognitive Neuroscience.

[127]  P. Dayan,et al.  Tonic dopamine: opportunity costs and the control of response vigor , 2007, Psychopharmacology.

[128]  David L. Faigman,et al.  Human category learning. , 2005, Annual review of psychology.

[129]  Christian Grillon,et al.  Stress increases aversive prediction error signal in the ventral striatum , 2013, Proceedings of the National Academy of Sciences.

[130]  D. Attwell,et al.  The neural basis of functional brain imaging signals , 2002, Trends in Neurosciences.

[131]  R. Zatorre,et al.  Anatomically distinct dopamine release during anticipation and experience of peak emotion to music , 2011, Nature Neuroscience.

[132]  P. Dayan,et al.  Model-based influences on humans’ choices and striatal prediction errors , 2011, Neuron.

[133]  Michael X. Cohen,et al.  Neuroelectric Signatures of Reward Learning and Decision-Making in the Human Nucleus Accumbens , 2009, Neuropsychopharmacology.

[134]  P. Montague,et al.  Activity in human ventral striatum locked to errors of reward prediction , 2002, Nature Neuroscience.

[135]  W. Schultz Behavioral theories and the neurophysiology of reward. , 2006, Annual review of psychology.

[136]  L. Huber,et al.  Selective Imitation in Domestic Dogs , 2007, Current Biology.

[137]  Richard S. Sutton,et al.  Time-Derivative Models of Pavlovian Reinforcement , 1990 .

[138]  Stephanie C. Y. Chan,et al.  On the value of information and other rewards , 2011, Nature Neuroscience.

[139]  Alexander Thiele,et al.  Neural Correlates of Chromatic Motion Perception , 2001, Neuron.

[140]  Justin L. Gardner,et al.  Learning to Simulate Others' Decisions , 2012, Neuron.

[141]  P. Shizgal,et al.  Prolonged rewarding stimulation of the rat medial forebrain bundle: neurochemical and behavioral consequences. , 2006, Behavioral neuroscience.

[142]  Joseph W. Kable,et al.  The valuation system: a coordinate-based meta-analysis examining BOLD correlates of subjective value , 2013 .

[143]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[144]  Kenji Matsumoto,et al.  Neural basis of the undermining effect of monetary reward on intrinsic motivation , 2010, Proceedings of the National Academy of Sciences.

[145]  N. Logothetis What we can do and what we cannot do with fMRI , 2008, Nature.

[146]  N. Logothetis The Underpinnings of the BOLD Functional Magnetic Resonance Imaging Signal , 2003, The Journal of Neuroscience.

[147]  Colin Camerer,et al.  Neural evidence for inequality-averse social preferences , 2010, Nature.

[148]  Brian Knutson,et al.  Linking nucleus accumbens dopamine and blood oxygenation , 2007, Psychopharmacology.

[149]  Ethan S. Bromberg-Martin,et al.  Dopamine in Motivational Control: Rewarding, Aversive, and Alerting , 2010, Neuron.

[150]  Raymond J. Dolan,et al.  Anticipation of novelty recruits reward system and hippocampus while promoting recollection , 2007, NeuroImage.

[151]  J. O'Doherty,et al.  Dissociable Brain Systems Mediate Vicarious Learning of Stimulus–Response and Action–Outcome Contingencies , 2012, The Journal of Neuroscience.

[152]  Nathaniel D. Daw,et al.  Trial-by-trial data analysis using computational models , 2011 .

[153]  G B Biederman,et al.  Observational learning of two visual discriminations by pigeons: a within-subjects design. , 1986, Journal of the experimental analysis of behavior.

[154]  Ravi S. Menon,et al.  Intrinsic signal changes accompanying sensory stimulation: functional brain mapping with magnetic resonance imaging. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[155]  M. Delgado,et al.  Perceptions of moral character modulate the neural systems of reward during the trust game , 2005, Nature Neuroscience.