A gradual backward shift of dopamine responses during associative learning

It has been proposed that the activity of dopamine neurons approximates temporal difference (TD) prediction error, a teaching signal developed in reinforcement learning, a field of machine learning. However, whether this similarity holds true during learning remains elusive. In particular, some TD learning models predict that the error signal gradually shifts backward in time from reward delivery to a reward-predictive cue, but previous experiments failed to observe such a gradual shift in dopamine activity. Here we demonstrate conditions in which such a shift can be detected experimentally. These shared dynamics of TD error and dopamine activity narrow the gap between machine learning theory and biological brains, tightening a long-sought link.

[1]  Kenji F. Tanaka,et al.  Dysfunction of ventrolateral striatal dopamine receptor type 2-expressing medium spiny neurons impairs instrumental motivation , 2017, Nature Communications.

[2]  Samuel J. Gershman,et al.  A Unified Framework for Dopamine Signals across Timescales , 2019, Cell.

[3]  Z. Mainen,et al.  Speed and accuracy of olfactory discrimination in the rat , 2003, Nature Neuroscience.

[4]  J. Hollerman,et al.  Dopamine neurons report an error in the temporal prediction of reward during learning , 1998, Nature Neuroscience.

[5]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[6]  W. Schultz,et al.  Influence of Reward Delays on Responses of Dopamine Neurons , 2008, The Journal of Neuroscience.

[7]  Raag D. Airan,et al.  Optogenetic interrogation of neural circuits: technology for probing mammalian brain structures , 2010, Nature Protocols.

[8]  W. Pan,et al.  Dopamine Cells Respond to Predicted Events during Classical Conditioning: Evidence for Eligibility Traces in the Reward-Learning Network , 2005, The Journal of Neuroscience.

[9]  Allan R. Jones,et al.  A robust and high-throughput Cre reporting and characterization system for the whole mouse brain , 2009, Nature Neuroscience.

[10]  P. Dayan,et al.  A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[11]  N. Uchida,et al.  Midbrain dopamine neurons signal aversion in a reward-context-dependent manner , 2016, eLife.

[12]  Luke T. Coddington,et al.  The timing of action determines reward prediction signals in identified midbrain dopamine neurons , 2018, Nature Neuroscience.

[13]  B. Lowell,et al.  Synaptic glutamate release by ventromedial hypothalamic neurons is part of the neurocircuitry that prevents hypoglycemia. , 2007, Cell metabolism.

[14]  Stefan R. Pulver,et al.  Ultra-sensitive fluorescent proteins for imaging neuronal activity , 2013, Nature.

[15]  J. Stepánek,et al.  I INTRODUCTION , 1961 .

[16]  Arif A. Hamid,et al.  Dissociable dopamine dynamics for learning and motivation. , 2019, Nature.

[17]  D. Hasselquist,et al.  No evidence that carotenoid pigments boost either immune or antioxidant defenses in a songbird , 2018, Nature Communications.

[18]  P. Fletcher,et al.  Faculty Opinions recommendation of A selective role for dopamine in stimulus-reward learning. , 2011 .

[19]  B. Hoffer,et al.  Characterization of a mouse strain expressing Cre recombinase from the 3′ untranslated region of the dopamine transporter locus , 2006, Genesis.

[20]  Thomas J. Walsh,et al.  Towards a Unified Theory of State Abstraction for MDPs , 2006, AI&M.

[21]  Peter Dayan,et al.  Dopamine: generalization and bonuses , 2002, Neural Networks.

[22]  Zeb Kurth-Nelson,et al.  Deep Reinforcement Learning and Its Neuroscientific Implications , 2020, Neuron.

[23]  Minmin Luo,et al.  Learning and Stress Shape the Reward Response Patterns of Serotonin Neurons , 2017, The Journal of Neuroscience.

[24]  Yuji Ikegaya,et al.  Genetically Encoded Green Fluorescent Ca2+ Indicators with Improved Detectability for Neuronal Ca2+ Signals , 2012, PloS one.

[25]  M. Botvinick,et al.  Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective , 2009, Cognition.

[26]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[27]  S. Gershman,et al.  Belief state representation in the dopamine system , 2018, Nature Communications.

[28]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[29]  Wolfram Schultz,et al.  Reward Contexts Extend Dopamine Signals to Unrewarded Stimuli , 2014, Current Biology.

[30]  N. Uchida,et al.  Distinct temporal difference error signals in dopamine axons in three regions of the striatum in a decision-making task , 2020, bioRxiv.

[31]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[32]  Simon Hong,et al.  A pallidus-habenula-dopamine pathway signals inferred stimulus values. , 2010, Journal of neurophysiology.

[33]  J. J. Macklin,et al.  High-performance calcium sensors for imaging activity in neuronal populations and microcompartments , 2019, Nature Methods.

[34]  Sachie K. Ogawa,et al.  Dopamine neurons projecting to the posterior striatum form an anatomically distinct subclass , 2015, eLife.

[35]  M. Roesch,et al.  Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards , 2007, Nature Neuroscience.

[36]  W. Schultz,et al.  Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task , 1993, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[37]  R. Rescorla,et al.  A theory of Pavlovian conditioning : Variations in the effectiveness of reinforcement and nonreinforcement , 1972 .

[38]  Thomas E. Hazy,et al.  PVLV: the primary value and learned value Pavlovian learning algorithm. , 2007, Behavioral neuroscience.

[39]  S. Haesler,et al.  Cue-Evoked Dopamine Promotes Conditioned Responding during Learning , 2020, Neuron.

[40]  Dayu Lin,et al.  New and improved GRAB fluorescent sensors for monitoring dopaminergic activity in vivo , 2020, bioRxiv.

[41]  N. Uchida,et al.  Opposite initialization to novel cues in dopamine signaling in ventral and posterior striatum in mice , 2016, eLife.