Neural Circuitry of Reward Prediction Error.

Dopamine neurons facilitate learning by calculating reward prediction error, or the difference between expected and actual reward. Despite two decades of research, it remains unclear how dopamine neurons make this calculation. Here we review studies that tackle this problem from a diverse set of approaches, from anatomy to electrophysiology to computational modeling and behavior. Several patterns emerge from this synthesis: that dopamine neurons themselves calculate reward prediction error, rather than inherit it passively from upstream regions; that they combine multiple separate and redundant inputs, which are themselves interconnected in a dense recurrent network; and that despite the complexity of inputs, the output from dopamine neurons is remarkably homogeneous and robust. The more we study this simple arithmetic computation, the knottier it appears to be, suggesting a daunting (but stimulating) path ahead for neuroscience more generally.

[1]  R. R. Bush,et al.  A Mathematical Model for Simple Learning , 1951 .

[2]  R. Rescorla A theory of pavlovian conditioning: The effectiveness of reinforcement and non-reinforcement , 1972 .

[3]  E. Rolls,et al.  Effects of hunger on the responses of neurons in the lateral hypothalamus to the sight and taste of food , 1976, Experimental Neurology.

[4]  A. Grace,et al.  Intracellular and extracellular electrophysiology of nigral dopaminergic neurons—1. Identification and characterization , 1983, Neuroscience.

[5]  O. Hikosaka,et al.  Neural activities in the monkey basal ganglia related to attention, memory and anticipation , 1986, Brain and Development.

[6]  W. Schultz Responses of midbrain dopamine neurons to behavioral trigger stimuli in the monkey. , 1986, Journal of neurophysiology.

[7]  T. Ono,et al.  Feeding and diurnal related activity of lateral hypothalamic neurons in freely behaving rats , 1986, Brain Research.

[8]  R. Wise,et al.  Brain dopamine and reward. , 1989, Annual review of psychology.

[9]  W. Schultz,et al.  Reward-related activity in the monkey striatum and substantia nigra. , 1993, Progress in brain research.

[10]  W. Schultz,et al.  Importance of unpredictability for reward responses in primate dopamine neurons. , 1994, Journal of neurophysiology.

[11]  S. Haber,et al.  Subsets of midbrain dopaminergic neurons in monkeys are distinguished by different levels of mRNA for the dopamine transporter: Comparison with the mRNA for the D2 receptor, tyrosine hydroxylase and calbindin immunoreactivity , 1995, The Journal of comparative neurology.

[12]  P. Dayan,et al.  A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[13]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[14]  Christof Koch,et al.  Shunting Inhibition Does Not Have a Divisive Effect on Firing Rates , 1997, Neural Computation.

[15]  J. Hollerman,et al.  Influence of reward expectation on behavior-related neuronal activity in primate striatum. , 1998, Journal of neurophysiology.

[16]  Joshua W. Brown,et al.  How the Basal Ganglia Use Parallel Excitatory and Inhibitory Learning Pathways to Selectively Respond to Unexpected Rewarding Cues , 1999, The Journal of Neuroscience.

[17]  A. Borst Seeing smells: imaging olfactory learning in bees , 1999, Nature Neuroscience.

[18]  S. J. Martin,et al.  Synaptic plasticity and memory: an evaluation of the hypothesis. , 2000, Annual review of neuroscience.

[19]  J. Horvitz Mesolimbocortical and nigrostriatal dopamine responses to salient non-reward events , 2000, Neuroscience.

[20]  W. Schultz,et al.  Dopamine responses comply with basic assumptions of formal learning theory , 2001, Nature.

[21]  Frances S. Chance,et al.  Gain Modulation from Background Synaptic Input , 2002, Neuron.

[22]  Jochen Roeper,et al.  Ih Channels Contribute to the Different Functional Properties of Identified Dopaminergic Subpopulations in the Midbrain , 2002, The Journal of Neuroscience.

[23]  Eytan Ruppin,et al.  Actor-critic models of the basal ganglia: new anatomical and computational perspectives , 2002, Neural Networks.

[24]  Andrea Hasenstaub,et al.  Barrages of Synaptic Activity Control the Gain and Sensitivity of Cortical Neurons , 2003, The Journal of Neuroscience.

[25]  Kenneth D Miller,et al.  Multiplicative Gain Changes Are Induced by Excitation or Inhibition Alone , 2003, The Journal of Neuroscience.

[26]  W. Schultz,et al.  Discrete Coding of Reward Probability and Uncertainty by Dopamine Neurons , 2003, Science.

[27]  R. Wise Dopamine, learning and motivation , 2004, Nature Reviews Neuroscience.

[28]  José Luis Contreras-Vidal,et al.  A Predictive Reinforcement Model of Dopamine Neurons for Learning Approach Behavior , 1999, Journal of Computational Neuroscience.

[29]  E. Vaadia,et al.  Coincident but Distinct Messages of Midbrain Dopamine and Striatal Tonically Active Neurons , 2004, Neuron.

[30]  K. Deisseroth,et al.  Millisecond-timescale, genetically targeted optical control of neural activity , 2005, Nature Neuroscience.

[31]  J. Glowinski,et al.  Electrical Synapses between Dopaminergic Neurons of the Substantia Nigra Pars Compacta , 2005, The Journal of Neuroscience.

[32]  S. Geisler,et al.  Afferents of the ventral tegmental area in the rat‐anatomical substratum for integrative functions , 2005, The Journal of comparative neurology.

[33]  W. Schultz,et al.  Adaptive Coding of Reward Value by Dopamine Neurons , 2005, Science.

[34]  P. Glimcher,et al.  Midbrain Dopamine Neurons Encode a Quantitative Reward Prediction Error Signal , 2005, Neuron.

[35]  J. Maunsell,et al.  Effects of spatial attention on contrast response functions in macaque area V4. , 2006, Journal of neurophysiology.

[36]  M. Quirk,et al.  Representation of Spatial Goals in Rat Orbitofrontal Cortex , 2006, Neuron.

[37]  Elyssa B. Margolis,et al.  The ventral tegmental area revisited: is there an electrophysiological marker for dopaminergic neurons? , 2006, The Journal of physiology.

[38]  P. Glimcher,et al.  Statistics of midbrain dopamine neuron spike trains in the awake primate. , 2007, Journal of neurophysiology.

[39]  Thomas E. Hazy,et al.  PVLV: the primary value and learned value Pavlovian learning algorithm. , 2007, Behavioral neuroscience.

[40]  Ian R. Wickersham,et al.  Monosynaptic Restriction of Transsynaptic Tracing from Single, Genetically Targeted Neurons , 2007, Neuron.

[41]  M. Kawato,et al.  Non-commercial Research and Educational Use including without Limitation Use in Instruction at Your Institution, Sending It to Specific Colleagues That You Know, and Providing a Copy to Your Institution's Administrator. All Other Uses, Reproduction and Distribution, including without Limitation Comm , 2022 .

[42]  O. Hikosaka,et al.  Lateral habenula as a source of negative reward signals in dopamine neurons , 2007, Nature.

[43]  M. Roesch,et al.  Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards , 2007, Nature Neuroscience.

[44]  R. Wightman,et al.  Associative learning mediates dynamic shifts in dopamine signaling in the nucleus accumbens , 2007, Nature Neuroscience.

[45]  D. S. Zahm,et al.  Glutamatergic Afferents of the Ventral Tegmental Area in the Rat , 2007, The Journal of Neuroscience.

[46]  W. Newsome,et al.  The temporal precision of reward prediction in dopamine neurons , 2008, Nature Neuroscience.

[47]  R. Joosten,et al.  Reward-Predictive Cues Enhance Excitatory Synaptic Strength onto Midbrain Dopamine Neurons , 2008, Science.

[48]  S. Lammel,et al.  Unique Properties of Mesoprefrontal Neurons within a Dual Mesocorticolimbic Dopamine System , 2008, Neuron.

[49]  Jessica A. Cardin,et al.  Cellular Mechanisms Underlying Stimulus-Dependent Gain Modulation in Primary Visual Cortex Neurons In Vivo , 2008, Neuron.

[50]  D. Bullock,et al.  A Local Circuit Model of Learned Striatal and Dopamine Cell Responses under Probabilistic Schedules of Reward , 2008, The Journal of Neuroscience.

[51]  E. Vaadia,et al.  Midbrain Dopaminergic Neurons and Striatal Cholinergic Interneurons Encode the Difference between Reward and Aversive Events at Different Epochs of Probabilistic Classical Conditioning Trials , 2008, The Journal of Neuroscience.

[52]  Samuel M. McClure,et al.  BOLD Responses Reflecting Dopaminergic Signals in the Human Ventral Tegmental Area , 2008, Science.

[53]  R. Wightman,et al.  Real-time chemical responses in the nucleus accumbens differentiate rewarding and aversive stimuli , 2008, Nature Neuroscience.

[54]  W. Schultz,et al.  Influence of Reward Delays on Responses of Dopamine Neurons , 2008, The Journal of Neuroscience.

[55]  O. Hikosaka,et al.  Representation of negative motivational value in the primate lateral habenula , 2009, Nature Neuroscience.

[56]  Susana Q. Lima,et al.  PINP: A New Method of Tagging Neuronal Populations for Identification during In Vivo Electrophysiological Recording , 2009, PloS one.

[57]  Frances S. Chance,et al.  Gain modulation of neuronal responses by subtractive and divisive mechanisms of inhibition. , 2009, Journal of neurophysiology.

[58]  O. Hikosaka,et al.  Two types of dopamine neuron distinctly convey positive and negative motivational signals , 2009, Nature.

[59]  Mark G. Baxter,et al.  The Rostromedial Tegmental Nucleus (RMTg), a GABAergic Afferent to Midbrain Dopamine Neurons, Encodes Aversive Stimuli and Inhibits Motor Responses , 2009, Neuron.

[60]  K. Deisseroth,et al.  Phasic Firing in Dopaminergic Neurons Is Sufficient for Behavioral Conditioning , 2009, Science.

[61]  Hagai Bergman,et al.  Synchronization of Midbrain Dopaminergic Neurons Is Enhanced by Rewarding Events , 2009, Neuron.

[62]  Ethan S. Bromberg-Martin,et al.  Dopamine in Motivational Control: Rewarding, Aversive, and Alerting , 2010, Neuron.

[63]  Xin Jin,et al.  Start/stop signals emerge in nigrostriatal circuits during sequence learning , 2010, Nature.

[64]  A. Grace,et al.  Cortico-Basal Ganglia Reward Network: Microcircuitry , 2010, Neuropsychopharmacology.

[65]  Thomas E. Hazy,et al.  Neural mechanisms of acquired phasic dopamine responses in learning , 2010, Neuroscience & Biobehavioral Reviews.

[66]  Shawn R. Olsen,et al.  Divisive Normalization in Olfactory Population Codes , 2010, Neuron.

[67]  Multiplying two numbers together in your head is a difficult task if you did not learn multiplication tables as a child. On the face of it, this is somewhat surprising given the remarkable power of the brain to perform , 2010 .

[68]  Tatsuo K Sato,et al.  Dopamine neurons learn to encode the long-term value of multiple future rewards , 2011, Proceedings of the National Academy of Sciences.

[69]  T. Robinson,et al.  A selective role for dopamine in reward learning , 2010, Nature.

[70]  Ilana B. Witten,et al.  Recombinase-Driver Rat Lines: Tools, Techniques, and Optogenetic Application to Dopamine-Mediated Reinforcement , 2011, Neuron.

[71]  P. Glimcher Understanding dopamine and reinforcement learning: The dopamine reward prediction error hypothesis , 2011, Proceedings of the National Academy of Sciences.

[72]  A. Joyner,et al.  Temporal-spatial changes in Sonic Hedgehog expression and signaling reveal different potentials of ventral mesencephalic progenitors to populate distinct ventral midbrain nuclei , 2011, Neural Development.

[73]  G. Laurent,et al.  Normalization for Sparse Encoding of Odors by a Wide-Field Interneuron , 2011, Science.

[74]  Robert C. Wilson,et al.  Expectancy-related changes in firing of dopamine neurons depend on orbitofrontal cortex , 2011, Nature Neuroscience.

[75]  Simon Hong,et al.  Negative Reward Signals from the Lateral Habenula to Dopamine Neurons Are Mediated by Rostromedial Tegmental Nucleus in Primates , 2011, The Journal of Neuroscience.

[76]  B. Moghaddam,et al.  Coordinated Activity of Ventral Tegmental Neurons Adapts to Appetitive and Aversive Learning , 2012, PloS one.

[77]  Anne E Carpenter,et al.  Neuron-type specific signals for reward and punishment in the ventral tegmental area , 2011, Nature.

[78]  A. Grace,et al.  Are you or aren’t you? Challenges associated with physiologically identifying dopamine neurons , 2012, Trends in Neurosciences.

[79]  M. Carandini,et al.  Parvalbumin-Expressing Interneurons Linearly Transform Cortical Responses to Visual Stimuli , 2012, Neuron.

[80]  Nathan R. Wilson,et al.  Division and subtraction by distinct cortical inhibitory networks in vivo , 2012, Nature.

[81]  G. Stuber,et al.  Activation of VTA GABA Neurons Disrupts Reward Consumption , 2012, Neuron.

[82]  Karl Deisseroth,et al.  Activation of Specific Interneurons Improves V1 Feature Selectivity and Visual Perception , 2012, Nature.

[83]  Karl J. Friston,et al.  Prediction, perception and agency , 2012, International journal of psychophysiology : official journal of the International Organization of Psychophysiology.

[84]  Kelly R. Tan,et al.  GABA Neurons of the VTA Drive Conditioned Place Aversion , 2012, Neuron.

[85]  K. Deisseroth,et al.  Input-specific control of reward and aversion in the ventral tegmental area , 2012, Nature.

[86]  Sachie K. Ogawa,et al.  Whole-Brain Mapping of Direct Inputs to Midbrain Dopamine Neurons , 2012, Neuron.

[87]  E. Oleson,et al.  Subsecond Dopamine Release in the Nucleus Accumbens Predicts Conditioned Punishment and Its Successful Avoidance , 2012, The Journal of Neuroscience.

[88]  Adi Mizrahi,et al.  Dissecting Local Circuits: Parvalbumin Interneurons Underlie Broad Feedback Control of Olfactory Bulb Output , 2013, Neuron.

[89]  S. Volman,et al.  New Insights into the Specificity and Plasticity of Reward and Aversion Encoding in the Mesolimbic System , 2013, The Journal of Neuroscience.

[90]  T. Komiyama,et al.  Parvalbumin-Expressing Interneurons Linearly Control Olfactory Bulb Output , 2013, Neuron.

[91]  Minryung R. Song,et al.  Diversity and Homogeneity in Responses of Midbrain Dopamine Neurons , 2013, The Journal of Neuroscience.

[92]  C. Fiorillo Two Dimensions of Value: Dopamine Neurons Represent Reward But Not Aversiveness , 2013, Science.

[93]  A. Graybiel,et al.  Prolonged Dopamine Signalling in Striatum Signals Proximity and Value of Distant Rewards , 2013, Nature.

[94]  K. Sakai,et al.  Dopaminergic Control of Motivation and Reinforcement Learning: A Closed-Circuit Account for Reward-Oriented Behavior , 2013, The Journal of Neuroscience.

[95]  Josiah R. Boivin,et al.  A Causal Link Between Prediction Errors, Dopamine Neurons and Learning , 2013, Nature Neuroscience.

[96]  Division of Labor for Division: Inhibitory Interneurons with Different Spatial Landscapes in the Olfactory System , 2013, Neuron.

[97]  W. Schultz Updating dopamine reward signals , 2013, Current Opinion in Neurobiology.

[98]  P. Glimcher,et al.  Phasic Dopamine Release in the Rat Nucleus Accumbens Symmetrically Encodes a Reward Prediction Error Term , 2014, The Journal of Neuroscience.

[99]  Julien Vitay,et al.  Timing and expectation of reward: a neuro-computational model of the afferents to the ventral tegmental area , 2014, Front. Neurorobot..

[100]  Samuel Gershman,et al.  Dopamine Ramps Are a Consequence of Reward Prediction Errors , 2014, Neural Computation.

[101]  William R. Stauffer,et al.  Dopamine prediction error responses integrate subjective value from different reward dimensions , 2014, Proceedings of the National Academy of Sciences.

[102]  Wolfram Schultz,et al.  Reward Contexts Extend Dopamine Signals to Unrewarded Stimuli , 2014, Current Biology.

[103]  S. Nakanishi,et al.  Aversive behavior induced by optogenetic inactivation of ventral tegmental area dopamine neurons is mediated by dopamine D2 receptors in the nucleus accumbens , 2014, Proceedings of the National Academy of Sciences.

[104]  Talia N. Lerner,et al.  Intact-Brain Analyses Reveal Distinct Information Carried by SNc Dopamine Subcircuits , 2015, Cell.

[105]  Ali Ghazizadeh,et al.  Dopamine Neurons Encoding Long-Term Memory of Object Value for Habitual Behavior , 2015, Cell.

[106]  Naoshige Uchida,et al.  Arithmetic and local circuitry underlying dopamine prediction errors , 2015, Nature.

[107]  Naoshige Uchida,et al.  Habenula Lesions Reveal that Multiple Mechanisms Underlie Dopamine Prediction Errors , 2015, Neuron.

[108]  Kae Nakamura,et al.  Predictive Reward Signal of Dopamine Neurons , 2015 .

[109]  Sachie K. Ogawa,et al.  Dopamine neurons projecting to the posterior striatum form an anatomically distinct subclass , 2015, eLife.

[110]  Liqun Luo,et al.  Circuit Architecture of VTA Dopamine Neurons Revealed by Systematic Input-Output Mapping , 2015, Cell.

[111]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[112]  E. Oleson,et al.  A role for phasic dopamine release within the nucleus accumbens in encoding aversion: a review of the neurochemical literature. , 2015, ACS chemical neuroscience.

[113]  Y. Niv,et al.  Temporal Specificity of Reward Prediction Errors Signaled by Putative Dopamine Neurons in Rat VTA Depends on Ventral Striatum , 2016, Neuron.

[114]  Ilana B. Witten,et al.  Reward and choice encoding in terminals of midbrain dopamine neurons depends on striatal target , 2016, Nature Neuroscience.

[115]  Author response: Midbrain dopamine neurons signal aversion in a reward-context-dependent manner , 2016 .

[116]  M. Howe,et al.  Rapid signaling in distinct dopaminergic axons during locomotion and reward , 2016, Nature.

[117]  W. Schultz Dopamine reward prediction error coding , 2016, Dialogues in clinical neuroscience.

[118]  Jeremiah Y. Cohen,et al.  Distributed and Mixed Information in Monosynaptic Inputs to Dopamine Neurons , 2016, Neuron.

[119]  Guillem R. Esber,et al.  Brief optogenetic inhibition of dopamine neurons mimics endogenous negative reward prediction errors , 2015, Nature Neuroscience.

[120]  Bo Li,et al.  A basal ganglia circuit for evaluating action outcomes , 2016, Nature.

[121]  Wolfram Schultz,et al.  Dopamine reward prediction-error signalling: a two-component response , 2016, Nature Reviews Neuroscience.

[122]  William R. Stauffer,et al.  Dopamine Neuron-Specific Optogenetic Stimulation in Rhesus Macaques , 2016, Cell.

[123]  N. Uchida,et al.  Dopamine neurons share common response function for reward prediction error , 2016, Nature Neuroscience.

[124]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[125]  Vaughn L. Hetrick,et al.  Mesolimbic Dopamine Signals the Value of Work , 2015, Nature Neuroscience.