Arithmetic and local circuitry underlying dopamine prediction errors

Dopamine neurons are thought to facilitate learning by comparing actual and expected reward. Despite two decades of investigation, little is known about how this comparison is made. To determine how dopamine neurons calculate prediction error, we combined optogenetic manipulations with extracellular recordings in the ventral tegmental area while mice engaged in classical conditioning. Here we demonstrate, by manipulating the temporal expectation of reward, that dopamine neurons perform subtraction, a computation that is ideal for reinforcement learning but rarely observed in the brain. Furthermore, selectively exciting and inhibiting neighbouring GABA (γ-aminobutyric acid) neurons in the ventral tegmental area reveals that these neurons are a source of subtraction: they inhibit dopamine neurons when reward is expected, causally contributing to prediction-error calculations. Finally, bilaterally stimulating ventral tegmental area GABA neurons dramatically reduces anticipatory licking to conditioned odours, consistent with an important role for these neurons in reinforcement learning. Together, our results uncover the arithmetic and local circuitry underlying dopamine prediction errors.

[1]  P. Glimcher,et al.  Midbrain Dopamine Neurons Encode a Quantitative Reward Prediction Error Signal , 2005, Neuron.

[2]  A. Redish,et al.  Neuronal activity in the rodent dorsal striatum in sequential navigation: separation of spatial and reward responses on the multiple T task. , 2004, Journal of neurophysiology.

[3]  Christof Koch,et al.  Shunting Inhibition Does Not Have a Divisive Effect on Firing Rates , 1997, Neural Computation.

[4]  Anne E Carpenter,et al.  Neuron-type specific signals for reward and punishment in the ventral tegmental area , 2011, Nature.

[5]  Z. Mainen,et al.  Speed and accuracy of olfactory discrimination in the rat , 2003, Nature Neuroscience.

[6]  Linh Vong,et al.  Leptin Action on GABAergic Neurons Prevents Obesity and Reduces Inhibitory Tone to POMC Neurons , 2011, Neuron.

[7]  Nicole Propst,et al.  Classical Conditioning Ii Current Research And Theory , 2016 .

[8]  Yoshua Bengio,et al.  Conditioning and time representation in long short-term memory networks , 2013, Biological Cybernetics.

[9]  M. Carandini,et al.  Parvalbumin-Expressing Interneurons Linearly Transform Cortical Responses to Visual Stimuli , 2012, Neuron.

[10]  W. F. Prokasy,et al.  Classical conditioning II: Current research and theory. , 1972 .

[11]  Nathan R. Wilson,et al.  Division and subtraction by distinct cortical inhibitory networks in vivo , 2012, Nature.

[12]  Frances S. Chance,et al.  Gain modulation of neuronal responses by subtractive and divisive mechanisms of inhibition. , 2009, Journal of neurophysiology.

[13]  B. Hoffer,et al.  Characterization of a mouse strain expressing Cre recombinase from the 3′ untranslated region of the dopamine transporter locus , 2006, Genesis.

[14]  Natalia Omelchenko,et al.  Ultrastructural analysis of local collaterals of rat ventral tegmental area neurons: GABA phenotype and synapses onto dopamine and GABA cells , 2009, Synapse.

[15]  Minryung R. Song,et al.  Multiphasic Temporal Dynamics in Responses of Midbrain Dopamine Neurons to Appetitive and Aversive Stimuli , 2013, The Journal of Neuroscience.

[16]  O. Hikosaka,et al.  Lateral habenula as a source of negative reward signals in dopamine neurons , 2007, Nature.

[17]  Nathan C. Klapoetke,et al.  A High-Light Sensitivity Optical Neural Silencer: Development and Application to Optogenetic Control of Non-Human Primate Cortex , 2010, Front. Syst. Neurosci..

[18]  Julien Vitay,et al.  Timing and expectation of reward: a neuro-computational model of the afferents to the ventral tegmental area , 2014, Front. Neurorobot..

[19]  Susana Q. Lima,et al.  PINP: A New Method of Tagging Neuronal Populations for Identification during In Vivo Electrophysiological Recording , 2009, PloS one.

[20]  Kelly R. Tan,et al.  GABA Neurons of the VTA Drive Conditioned Place Aversion , 2012, Neuron.

[21]  M. Carandini,et al.  Normalization as a canonical neural computation , 2011, Nature Reviews Neuroscience.

[22]  Thomas E. Hazy,et al.  Neural mechanisms of acquired phasic dopamine responses in learning , 2010, Neuroscience & Biobehavioral Reviews.

[23]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[24]  Joshua I. Sanders,et al.  Cortical interneurons that specialize in disinhibitory control , 2013, Nature.

[25]  Kenneth D Miller,et al.  Multiplicative Gain Changes Are Induced by Excitation or Inhibition Alone , 2003, The Journal of Neuroscience.

[26]  Jefferson E. Roy,et al.  Dissociating Self-Generated from Passively Applied Head Motion: Neural Mechanisms in the Vestibular Nuclei , 2004, The Journal of Neuroscience.

[27]  W. Schultz,et al.  Adaptive Coding of Reward Value by Dopamine Neurons , 2005, Science.

[28]  B. Hangya,et al.  Distinct behavioural and network correlates of two interneuron types in prefrontal cortex , 2013, Nature.

[29]  R. R. Bush,et al.  A Mathematical Model for Simple Learning , 1951 .

[30]  D. Bullock,et al.  A Local Circuit Model of Learned Striatal and Dopamine Cell Responses under Probabilistic Schedules of Reward , 2008, The Journal of Neuroscience.

[31]  Joel L. Davis,et al.  Adaptive Critics and the Basal Ganglia , 1995 .

[32]  Eero P. Simoncelli,et al.  Spatiotemporal Elements of Macaque V1 Receptive Fields , 2005, Neuron.

[33]  G. Stuber,et al.  Activation of VTA GABA Neurons Disrupts Reward Consumption , 2012, Neuron.

[34]  Richard S. Sutton,et al.  Stimulus Representation and the Timing of Reward-Prediction Errors in Models of the Dopamine System , 2008, Neural Computation.

[35]  S. Sternson,et al.  A FLEX Switch Targets Channelrhodopsin-2 to Multiple Cell Types for Imaging and Long-Range Circuit Mapping , 2008, The Journal of Neuroscience.

[36]  M. Kawato,et al.  Non-commercial Research and Educational Use including without Limitation Use in Instruction at Your Institution, Sending It to Specific Colleagues That You Know, and Providing a Copy to Your Institution's Administrator. All Other Uses, Reproduction and Distribution, including without Limitation Comm , 2022 .

[37]  Shawn R. Olsen,et al.  Divisive Normalization in Olfactory Population Codes , 2010, Neuron.

[38]  Simon Hong,et al.  Negative Reward Signals from the Lateral Habenula to Dopamine Neurons Are Mediated by Rostromedial Tegmental Nucleus in Primates , 2011, The Journal of Neuroscience.

[39]  K. Deisseroth,et al.  Millisecond-timescale, genetically targeted optical control of neural activity , 2005, Nature Neuroscience.