A Unified Framework for Dopamine Signals across Timescales

Rapid phasic activity of midbrain dopamine neurons are thought to signal reward prediction errors (RPEs), resembling temporal difference errors used in machine learning. Recent studies describing slowly increasing dopamine signals have instead proposed that they represent state values and arise independently from somatic spiking activity. Here, we developed novel experimental paradigms using virtual reality that disambiguate RPEs from values. We examined the dopamine circuit activity at various stages including somatic spiking, axonal calcium signals, and striatal dopamine concentrations. Our results demonstrate that ramping dopamine signals are consistent with RPEs rather than value, and this ramping is observed at all the stages examined. We further show that ramping dopamine signals can be driven by a dynamic stimulus that indicates a gradual approach to a reward. We provide a unified computational understanding of rapid phasic and slowly ramping dopamine signals: dopamine neurons perform a derivative-like computation over values on a moment-by-moment basis.

[1]  E. Oleson,et al.  A role for phasic dopamine release within the nucleus accumbens in encoding aversion: a review of the neurochemical literature. , 2015, ACS chemical neuroscience.

[2]  R. Wightman,et al.  Dopamine Operates as a Subsecond Modulator of Food Seeking , 2004, The Journal of Neuroscience.

[3]  Susana Q. Lima,et al.  PINP: A New Method of Tagging Neuronal Populations for Identification during In Vivo Electrophysiological Recording , 2009, PloS one.

[4]  Yves Kremer,et al.  Context-Dependent Multiplexing by Individual VTA Dopamine Neurons , 2019, The Journal of Neuroscience.

[5]  Naoshige Uchida,et al.  Habenula Lesions Reveal that Multiple Mechanisms Underlie Dopamine Prediction Errors , 2015, Neuron.

[6]  Z. Mainen,et al.  Speed and accuracy of olfactory discrimination in the rat , 2003, Nature Neuroscience.

[7]  R. Joosten,et al.  Reward-Predictive Cues Enhance Excitatory Synaptic Strength onto Midbrain Dopamine Neurons , 2008, Science.

[8]  Dmitriy Aronov,et al.  Engagement of Neural Circuits Underlying 2D Spatial Navigation in a Rodent Virtual Reality System , 2014, Neuron.

[9]  M. Howe,et al.  Rapid signaling in distinct dopaminergic axons during locomotion and reward , 2016, Nature.

[10]  David S. Touretzky,et al.  Representation and Timing in Theories of the Dopamine System , 2006, Neural Computation.

[11]  B. Hangya,et al.  Distinct behavioural and network correlates of two interneuron types in prefrontal cortex , 2013, Nature.

[12]  D. Lovinger,et al.  Selective activation of cholinergic interneurons enhances accumbal phasic dopamine release: setting the tone for reward processing. , 2012, Cell reports.

[13]  Ilana B. Witten,et al.  Reward prediction error does not explain movement selectivity in DMS-projecting dopamine neurons , 2018, bioRxiv.

[14]  T. Robinson,et al.  A selective role for dopamine in reward learning , 2010, Nature.

[15]  B. Sabatini,et al.  Dopaminergic Modulation of Synaptic Transmission in Cortex and Striatum , 2012, Neuron.

[16]  John A. King,et al.  How vision and movement combine in the hippocampal place code , 2012, Proceedings of the National Academy of Sciences.

[17]  Stefan R. Pulver,et al.  Ultra-sensitive fluorescent proteins for imaging neuronal activity , 2013, Nature.

[18]  John A. Dani,et al.  Endogenous nicotinic cholinergic activity regulates dopamine release in the striatum , 2001, Nature Neuroscience.

[19]  Sachie K. Ogawa,et al.  Dopamine neurons projecting to the posterior striatum form an anatomically distinct subclass , 2015, eLife.

[20]  Anne E Carpenter,et al.  Neuron-type specific signals for reward and punishment in the ventral tegmental area , 2011, Nature.

[21]  W. Schultz Multiple dopamine functions at different time courses. , 2007, Annual review of neuroscience.

[22]  Christopher I. Moore,et al.  Dopamine waves as a mechanism for spatiotemporal credit assignment , 2019, bioRxiv.

[23]  Y. Niv Reinforcement learning in the brain , 2009 .

[24]  G. Schoenbaum,et al.  Dopamine neuron ensembles signal the content of sensory prediction errors , 2019, bioRxiv.

[25]  P. Killeen,et al.  A behavioral theory of timing. , 1988, Psychological review.

[26]  P. Glimcher,et al.  Phasic Dopamine Release in the Rat Nucleus Accumbens Symmetrically Encodes a Reward Prediction Error Term , 2014, The Journal of Neuroscience.

[27]  Dora E Angelaki,et al.  A Functional Link between MT Neurons and Depth Perception Based on Motion Parallax , 2015, The Journal of Neuroscience.

[28]  Ethan S. Bromberg-Martin,et al.  Distinct Tonic and Phasic Anticipatory Activity in Lateral Habenula and Dopamine Neurons , 2010, Neuron.

[29]  Ilana B. Witten,et al.  Specialized coding of sensory, motor, and cognitive variables in VTA dopamine neurons , 2019, Nature.

[30]  Kenji Morita,et al.  Striatal dopamine ramping may indicate flexible reinforcement learning with forgetting in the cortico-basal ganglia circuits , 2014, Front. Neural Circuits.

[31]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[32]  D. Tank,et al.  Intracellular dynamics of hippocampal place cells during virtual navigation , 2009, Nature.

[33]  Naoshige Uchida,et al.  Arithmetic and local circuitry underlying dopamine prediction errors , 2015, Nature.

[34]  Arif A. Hamid,et al.  Dissociable dopamine dynamics for learning and motivation. , 2019, Nature.

[35]  Peter Dayan,et al.  Tamping Ramping: Algorithmic, Implementational, and Computational Explanations of Phasic Dopamine Signals in the Accumbens , 2015, PLoS Comput. Biol..

[36]  Hannah M. Batchelor,et al.  Dopamine Neurons Respond to Errors in the Prediction of Sensory Features of Expected Rewards , 2017, Neuron.

[37]  Vaughn L. Hetrick,et al.  Mesolimbic Dopamine Signals the Value of Work , 2015, Nature Neuroscience.

[38]  Ilana B. Witten,et al.  Striatal circuits for reward learning and decision-making , 2019, Nature Reviews Neuroscience.

[39]  N. Uchida,et al.  Neural Circuitry of Reward Prediction Error. , 2017, Annual review of neuroscience.

[40]  W. Newsome,et al.  The temporal precision of reward prediction in dopamine neurons , 2008, Nature Neuroscience.

[41]  P. Dayan,et al.  Tonic dopamine: opportunity costs and the control of response vigor , 2007, Psychopharmacology.

[42]  Hod Lipson,et al.  Distilling Free-Form Natural Laws from Experimental Data , 2009, Science.

[43]  S. Lammel,et al.  Nucleus Accumbens Subnuclei Regulate Motivated Behavior via Direct Inhibition and Disinhibition of VTA Dopamine Subpopulations , 2018, Neuron.

[44]  K. Deisseroth,et al.  Phasic Firing in Dopaminergic Neurons Is Sufficient for Behavioral Conditioning , 2009, Science.

[45]  R Bellman,et al.  On the Theory of Dynamic Programming. , 1952, Proceedings of the National Academy of Sciences of the United States of America.

[46]  George Paxinos,et al.  The Mouse Brain in Stereotaxic Coordinates , 2001 .

[47]  A. Graybiel,et al.  Prolonged Dopamine Signalling in Striatum Signals Proximity and Value of Distant Rewards , 2013, Nature.

[48]  R. Costa,et al.  Dopamine neuron activity before action initiation gates and invigorates future movements , 2018, Nature.

[49]  J. Berke What does dopamine mean? , 2018, Nature Neuroscience.

[50]  C. Gallistel,et al.  Toward a neurobiology of temporal cognition: advances and challenges , 1997, Current Opinion in Neurobiology.

[51]  B. Hoffer,et al.  Characterization of a mouse strain expressing Cre recombinase from the 3′ untranslated region of the dopamine transporter locus , 2006, Genesis.

[52]  S. Ikemoto Dopamine reward circuitry: Two projection systems from the ventral midbrain to the nucleus accumbens–olfactory tubercle complex , 2007, Brain Research Reviews.

[53]  S. Lammel,et al.  Unique Properties of Mesoprefrontal Neurons within a Dual Mesocorticolimbic Dopamine System , 2008, Neuron.

[54]  Richard S. Sutton,et al.  Stimulus Representation and the Timing of Reward-Prediction Errors in Models of the Dopamine System , 2008, Neural Computation.

[55]  S. Gershman,et al.  Belief state representation in the dopamine system , 2018, Nature Communications.

[56]  J. Roeper,et al.  In vivo functional diversity of midbrain dopamine neurons within identified axonal projections , 2019, bioRxiv.

[57]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[58]  O. Hikosaka,et al.  Two types of dopamine neuron distinctly convey positive and negative motivational signals , 2009, Nature.

[59]  R. Wightman,et al.  Subsecond dopamine release promotes cocaine seeking , 2003, Nature.

[60]  A. Ogura,et al.  A single optical fiber fluorometric device for measurement of intracellular Ca2+ concentration: Its application to hippocampal neurons in vitro and in vivo , 1992, Neuroscience.

[61]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[62]  P. Phillips,et al.  Pavlovian valuation systems in learning and decision making , 2012, Current Opinion in Neurobiology.

[63]  S. Brunton,et al.  Discovering governing equations from data by sparse identification of nonlinear dynamical systems , 2015, Proceedings of the National Academy of Sciences.

[64]  Samuel Gershman,et al.  Dopamine Ramps Are a Consequence of Reward Prediction Errors , 2014, Neural Computation.

[65]  S. Cragg,et al.  Striatal dopamine neurotransmission: regulation of release and uptake. , 2016, Basal ganglia.

[66]  Samuel J. Gershman,et al.  Believing in dopamine , 2019, Nature Reviews Neuroscience.

[67]  P. Glimcher,et al.  Midbrain Dopamine Neurons Encode a Quantitative Reward Prediction Error Signal , 2005, Neuron.

[68]  Allan R. Jones,et al.  A robust and high-throughput Cre reporting and characterization system for the whole mouse brain , 2009, Nature Neuroscience.

[69]  Yves Kremer,et al.  VTA dopamine neurons multiplex external with internal representations of goal-directed action , 2018 .

[70]  Ilana B. Witten,et al.  Reward and choice encoding in terminals of midbrain dopamine neurons depends on striatal target , 2016, Nature Neuroscience.

[71]  Talia N. Lerner,et al.  Intact-Brain Analyses Reveal Distinct Information Carried by SNc Dopamine Subcircuits , 2015, Cell.

[72]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[73]  W. Schultz,et al.  Influence of Reward Delays on Responses of Dopamine Neurons , 2008, The Journal of Neuroscience.

[74]  M. Warden,et al.  Ramping activity in midbrain dopamine neurons signifies the use of a cognitive map , 2020, bioRxiv.

[75]  Naoshige Uchida,et al.  Multiple Dopamine Systems: Weal and Woe of Dopamine. , 2019, Cold Spring Harbor symposia on quantitative biology.

[76]  Samuel Gershman,et al.  Time representation in reinforcement learning models of the basal ganglia , 2014, Front. Comput. Neurosci..

[77]  Anatol C. Kreitzer,et al.  A Genetically Encoded Fluorescent Sensor Enables Rapid and Specific Detection of Dopamine in Flies, Fish, and Mice , 2018, Cell.

[78]  S. Gershman,et al.  Dopamine reward prediction errors reflect hidden state inference across time , 2017, Nature Neuroscience.

[79]  Samuel J. Gershman,et al.  The role of state uncertainty in the dynamics of dopamine , 2019, Current Biology.

[80]  D. Tank,et al.  Imaging Large-Scale Neural Activity with Cellular Resolution in Awake, Mobile Mice , 2007, Neuron.

[81]  Zeb Kurth-Nelson,et al.  A distributional code for value in dopamine-based reinforcement learning , 2020, Nature.

[82]  N. Uchida,et al.  Opposite initialization to novel cues in dopamine signaling in ventral and posterior striatum in mice , 2016, eLife.

[83]  A. Redish,et al.  Neuronal activity in the rodent dorsal striatum in sequential navigation: separation of spatial and reward responses on the multiple T task. , 2004, Journal of neurophysiology.

[84]  I. Podlubny Fractional differential equations : an introduction to fractional derivatives, fractional differential equations, to methods of their solution and some of their applications , 1999 .

[85]  N. Uchida,et al.  Dopamine neurons projecting to the posterior striatum reinforce avoidance of threatening stimuli , 2018, Nature Neuroscience.

[86]  K. Deisseroth,et al.  Striatal Dopamine Release Is Triggered by Synchronized Activity in Cholinergic Interneurons , 2012, Neuron.

[87]  Richard S. Sutton,et al.  Time-Derivative Models of Pavlovian Reinforcement , 1990 .

[88]  N. Uchida,et al.  Dopamine neurons share common response function for reward prediction error , 2016, Nature Neuroscience.

[89]  Elliot A. Ludvig,et al.  Evaluating the TD model of classical conditioning , 2012, Learning & behavior.

[90]  Dayu Lin,et al.  Next-generation GRAB sensors for monitoring dopaminergic activity in vivo , 2020, Nature Methods.

[91]  Kenji Doya,et al.  Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.