Dynamic resource allocation during reinforcement learning accounts for ramping and phasic dopamine activity

For an animal to learn about its environment with limited motor and cognitive resources, it should focus its resources on potentially important stimuli. However, too narrow focus is disadvantageous for adaptation to environmental changes. Midbrain dopamine neurons are excited by potentially important stimuli, such as reward-predicting or novel stimuli, and allocate resources to these stimuli by modulating how an animal approaches, exploits, explores, and attends. The current study examined the theoretical possibility that dopamine activity reflects the dynamic allocation of resources for learning. Dopamine activity may transition between two patterns: (1) phasic responses to cues and rewards, and (2) ramping activity arising as the agent approaches the reward. Phasic excitation has been explained by prediction errors generated by experimentally inserted cues. However, when and why dopamine activity transitions between the two patterns remain unknown. By parsimoniously modifying a standard temporal difference (TD) learning model to accommodate a mixed presentation of both experimental and environmental stimuli, we simulated dopamine transitions and compared them with experimental data from four different studies. The results suggested that dopamine transitions from ramping to phasic patterns as the agent focuses its resources on a small number of reward-predicting stimuli, thus leading to task dimensionality reduction. The opposite occurs when the agent re-distributes its resources to adapt to environmental changes, resulting in task dimensionality expansion. This research elucidates the role of dopamine in a broader context, providing a potential explanation for the diverse repertoire of dopamine activity that cannot be explained solely by prediction error.

[1]  Joshua L. Jones,et al.  Dopamine transients are sufficient and necessary for acquisition of model-based associations , 2017, Nature Neuroscience.

[2]  W. Pan,et al.  Dopamine Cells Respond to Predicted Events during Classical Conditioning: Evidence for Eligibility Traces in the Reward-Learning Network , 2005, The Journal of Neuroscience.

[3]  J. Berke What does dopamine mean? , 2018, Nature Neuroscience.

[4]  Matthew P. H. Gardner,et al.  Optogenetic Blockade of Dopamine Transients Prevents Learning Induced by Changes in Reward Features , 2017, Current Biology.

[5]  J. Gottlieb Attention, Learning, and the Value of Information , 2012, Neuron.

[6]  Xin Jin,et al.  Start/stop signals emerge in nigrostriatal circuits during sequence learning , 2010, Nature.

[7]  S. Kakade,et al.  Learning and selective attention , 2000, Nature Neuroscience.

[8]  Kenji Morita,et al.  Striatal dopamine ramping may indicate flexible reinforcement learning with forgetting in the cortico-basal ganglia circuits , 2014, Front. Neural Circuits.

[9]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[10]  I. Gormezano,et al.  Conditioning the rabbit's (Oryctolagus cuniculus) jaw-movement response: US magnitude effects on URs, CRs, and pseudo-CRs. , 1972, Journal of comparative and physiological psychology.

[11]  A. Graybiel Habits, rituals, and the evaluative brain. , 2008, Annual review of neuroscience.

[12]  L. Zweifel,et al.  Dopamine Neurons Reflect the Uncertainty in Fear Generalization , 2018, Neuron.

[13]  Ilana B. Witten,et al.  Specialized coding of sensory, motor, and cognitive variables in VTA dopamine neurons , 2019, Nature.

[14]  Ali Ghazizadeh,et al.  Dopamine Neurons Encoding Long-Term Memory of Object Value for Habitual Behavior , 2015, Cell.

[15]  Xin Jin,et al.  Dynamic Nigrostriatal Dopamine Biases Action Selection , 2017, Neuron.

[16]  B. Balleine,et al.  Instrumental learning in hyperdopaminergic mice , 2006, Neurobiology of Learning and Memory.

[17]  Amir Dezfouli,et al.  Habits as action sequences: hierarchical action control and changes in outcome value , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.

[18]  Minryung R. Song,et al.  Diversity and Homogeneity in Responses of Midbrain Dopamine Neurons , 2013, The Journal of Neuroscience.

[19]  R. Joosten,et al.  Reward-Predictive Cues Enhance Excitatory Synaptic Strength onto Midbrain Dopamine Neurons , 2008, Science.

[20]  J. Wickens,et al.  Space, time and dopamine , 2007, Trends in Neurosciences.

[21]  H. Yin,et al.  The role of the basal ganglia in habit formation , 2006, Nature Reviews Neuroscience.

[22]  Josiah R. Boivin,et al.  A Causal Link Between Prediction Errors, Dopamine Neurons and Learning , 2013, Nature Neuroscience.

[23]  P. Glimcher,et al.  Phasic Dopamine Release in the Rat Nucleus Accumbens Symmetrically Encodes a Reward Prediction Error Term , 2014, The Journal of Neuroscience.

[24]  J. Johansen,et al.  Neuromodulation in circuits of aversive emotional learning , 2019, Nature Neuroscience.

[25]  Nathaniel Daw,et al.  Behavioral Neuroscience , 2022 .

[26]  Luke T. Coddington,et al.  Learning from Action: Reconsidering Movement Signaling in Midbrain Dopamine Neuron Activity , 2019, Neuron.

[27]  D. Durstewitz,et al.  The Dual-State Theory of Prefrontal Cortex Dopamine Function with Relevance to Catechol-O-Methyltransferase Genotypes and Schizophrenia , 2008, Biological Psychiatry.

[28]  M. Howe,et al.  Rapid signaling in distinct dopaminergic axons during locomotion and reward , 2016, Nature.

[29]  Rafal Bogacz,et al.  Learning to use working memory: a reinforcement learning gating model of rule acquisition in rats , 2012, Front. Comput. Neurosci..

[30]  N. Daw,et al.  Differential roles of human striatum and amygdala in associative learning , 2011, Nature Neuroscience.

[31]  Daeyeol Lee,et al.  Hippocampal Neural Correlates for Values of Experienced Events , 2012, The Journal of Neuroscience.

[32]  Robert C. Wilson,et al.  Reinforcement Learning in Multidimensional Environments Relies on Attention Mechanisms , 2015, The Journal of Neuroscience.

[33]  Donna J. Calu,et al.  The Dopamine Prediction Error: Contributions to Associative Models of Reward Learning , 2017, Front. Psychol..

[34]  Ilana B. Witten,et al.  Striatal circuits for reward learning and decision-making , 2019, Nature Reviews Neuroscience.

[35]  T. Braver,et al.  Dopamine Does Double Duty in Motivating Cognitive Effort , 2016, Neuron.

[36]  Vincent D Costa,et al.  Dopamine modulates novelty seeking behavior during decision making. , 2014, Behavioral neuroscience.

[37]  A. Bonci,et al.  Role of Dopamine Neurons in Reward and Aversion: A Synaptic Plasticity Perspective , 2015, Neuron.

[38]  William R. Stauffer,et al.  Dopamine neurons learn relative chosen value from probabilistic rewards , 2016, eLife.

[39]  Guillem R. Esber,et al.  Brief optogenetic inhibition of dopamine neurons mimics endogenous negative reward prediction errors , 2015, Nature Neuroscience.

[40]  A. Graybiel,et al.  Prolonged Dopamine Signalling in Striatum Signals Proximity and Value of Distant Rewards , 2013, Nature.

[41]  S. Gershman,et al.  Dopamine reward prediction errors reflect hidden state inference across time , 2017, Nature Neuroscience.

[42]  W. Schultz Dopamine signals for reward value and risk: basic and recent data , 2010, Behavioral and Brain Functions.

[43]  Ayaka Kato,et al.  Forgetting in Reinforcement Learning Links Sustained Dopamine Signals to Motivation , 2016, PLoS Comput. Biol..

[44]  Anne L. Collins,et al.  Dynamic mesolimbic dopamine signaling during action sequence learning and expectation violation , 2016, Scientific Reports.

[45]  A. Nieder,et al.  Dopamine Regulates Two Classes of Primate Prefrontal Neurons That Represent Sensory Signals , 2013, The Journal of Neuroscience.

[46]  J. Pearce,et al.  A model for Pavlovian learning: Variations in the effectiveness of conditioned but not of unconditioned stimuli. , 1980 .

[47]  N. Uchida,et al.  Dopamine neurons share common response function for reward prediction error , 2016, Nature Neuroscience.

[48]  J. Horvitz,et al.  Dopaminergic Mechanisms in Actions and Habits , 2007, The Journal of Neuroscience.

[49]  Wolfram Schultz,et al.  Dopamine reward prediction-error signalling: a two-component response , 2016, Nature Reviews Neuroscience.

[50]  S. Nicola,et al.  Activation of Dopamine Receptors in the Nucleus Accumbens Promotes Sucrose-Reinforced Cued Approach Behavior , 2016, Front. Behav. Neurosci..

[51]  Anne G E Collins,et al.  How much of reinforcement learning is working memory, not reinforcement learning? A behavioral, computational, and neurogenetic analysis , 2012, The European journal of neuroscience.

[52]  David Badre,et al.  Working Memory Load Strengthens Reward Prediction Errors , 2017, The Journal of Neuroscience.

[53]  Yuan Chang Leong,et al.  Dynamic Interaction between Reinforcement Learning and Attention in Multidimensional Environments , 2017, Neuron.

[54]  Michael J. Frank,et al.  Dopamine and proximity in motivation and cognitive control , 2018, Current Opinion in Behavioral Sciences.

[55]  Ethan S. Bromberg-Martin,et al.  Dopamine in Motivational Control: Rewarding, Aversive, and Alerting , 2010, Neuron.

[56]  Jörg Rieskamp,et al.  Value-based attentional capture affects multi-alternative decision making , 2018, eLife.

[57]  Alexander C. Huk,et al.  Parsing signal and noise in the brain , 2019, Science.

[58]  Samuel Gershman,et al.  Dopamine Ramps Are a Consequence of Reward Prediction Errors , 2014, Neural Computation.

[59]  P. Kaeser,et al.  Mechanisms and regulation of dopamine release , 2019, Current Opinion in Neurobiology.

[60]  R. Costa,et al.  Dopamine neuron activity before action initiation gates and invigorates future movements , 2018, Nature.

[61]  R. Bogacz,et al.  Action Initiation Shapes Mesolimbic Dopamine Encoding of Future Rewards , 2015, Nature Neuroscience.

[62]  H. Scheich,et al.  Learning a new behavioral strategy in the shuttle-box increases prefrontal dopamine , 2004, Neuroscience.

[63]  N. Uchida,et al.  Opposite initialization to novel cues in dopamine signaling in ventral and posterior striatum in mice , 2016, eLife.

[64]  Michael J Frank,et al.  Dopamine, Locus of Control, and the Exploration-Exploitation Tradeoff , 2015, Neuropsychopharmacology.

[65]  Robert C. Wilson,et al.  Rational regulation of learning dynamics by pupil–linked arousal systems , 2012, Nature Neuroscience.

[66]  Arif A. Hamid,et al.  Dissociable dopamine dynamics for learning and motivation. , 2019, Nature.

[67]  Joseph J. Paton,et al.  The many worlds hypothesis of dopamine prediction error: implications of a parallel circuit architecture in the basal ganglia , 2017, Current Opinion in Neurobiology.

[68]  Peter Dayan,et al.  Tamping Ramping: Algorithmic, Implementational, and Computational Explanations of Phasic Dopamine Signals in the Accumbens , 2015, PLoS Comput. Biol..

[69]  P. J. Sheafor "Pseudoconditioned" jaw movements of the rabbit reflect associations conditioned to contextual background cues. , 1975, Journal of experimental psychology. Animal behavior processes.

[70]  Hannah M. Batchelor,et al.  Dopamine Neurons Respond to Errors in the Prediction of Sensory Features of Expected Rewards , 2017, Neuron.

[71]  Vaughn L. Hetrick,et al.  Mesolimbic Dopamine Signals the Value of Work , 2015, Nature Neuroscience.

[72]  B. Skinner Superstition in the pigeon. , 1948, Journal of experimental psychology.

[73]  Naoshige Uchida,et al.  Arithmetic and local circuitry underlying dopamine prediction errors , 2015, Nature.

[74]  Min Whan Jung,et al.  Differential coding of reward and movement information in the dorsomedial striatal direct and indirect pathways , 2017, Nature Communications.

[75]  K. Berridge The debate over dopamine’s role in reward: the case for incentive salience , 2007, Psychopharmacology.

[76]  S. Lammel,et al.  Unique Properties of Mesoprefrontal Neurons within a Dual Mesocorticolimbic Dopamine System , 2008, Neuron.

[77]  R. Wightman,et al.  Phasic Nucleus Accumbens Dopamine Encodes Risk-Based Decision-Making Behavior , 2012, Biological Psychiatry.

[78]  S. Lammel,et al.  Projection-Specific Modulation of Dopamine Neuron Synapses by Aversive and Rewarding Stimuli , 2011, Neuron.

[79]  Luke T. Coddington,et al.  The timing of action determines reward prediction signals in identified midbrain dopamine neurons , 2018, Nature Neuroscience.

[80]  W. Pan,et al.  Tripartite Mechanism of Extinction Suggested by Dopamine Neuron Activity and Temporal Difference Model , 2008, The Journal of Neuroscience.

[81]  Daeyeol Lee,et al.  Beyond working memory: the role of persistent activity in decision making , 2010, Trends in Cognitive Sciences.

[82]  R. Kalisch,et al.  Dopamine neurons drive fear extinction learning by signaling the omission of expected aversive outcomes , 2018, eLife.

[83]  T. Robinson,et al.  A selective role for dopamine in reward learning , 2010, Nature.

[84]  Guillem R. Esber,et al.  Attention-Related Pearce-Kaye-Hall Signals in Basolateral Amygdala Require the Midbrain Dopaminergic System , 2012, Biological Psychiatry.

[85]  Jonathan D. Cohen,et al.  Learning to Use Working Memory in Partially Observable Environments through Dopaminergic Reinforcement , 2008, NIPS.

[86]  Jeremiah Y. Cohen,et al.  Distributed and Mixed Information in Monosynaptic Inputs to Dopamine Neurons , 2016, Neuron.