A feature-specific prediction error model explains dopaminergic heterogeneity

The hypothesis that midbrain dopamine (DA) neurons broadcast an error for the prediction of reward (reward prediction error, RPE) is among the great successes of computational neuroscience1–3. However, recent results contradict a core aspect of this theory: that the neurons uniformly convey a scalar, global signal. For instance, when animals are placed in a high-dimensional environment, DA neurons in the ventral tegmental area (VTA) display substantial heterogeneity in the features to which they respond, while also having more consistent RPE-like responses at the time of reward4. We argue that the previously predominant family of extensions to the RPE model, which replicate the classic model in multiple parallel circuits, are ill-suited to explaining these and other results concerning DA heterogeneity within the VTA. Instead, we introduce a complementary “feature-specific RPE” model positing that DA neurons within VTA report individual RPEs for different elements of a population vector code for an animal’s state (moment-to-moment situation). To investigate this claim, we train a deep reinforcement learning model on a navigation and decision-making task and compare the feature-specific RPE derived from the network to population recordings from DA neurons during the same task. The model recapitulates key aspects of VTA DA neuron heterogeneity. Further, we show how our framework can be extended to explain patterns of heterogeneity in action responses reported among SNc DA neurons5. Thus, our work provides a path to reconcile new observations of DA neuron heterogeneity with classic ideas about RPE coding, while also providing a new perspective on how the brain performs reinforcement learning in high dimensional environments.

[1]  Manivannan Subramaniyan,et al.  Distributed processing for value-based choice by prelimbic circuits targeting anterior-posterior dorsal striatal subregions in male mice , 2023, Nature Communications.

[2]  Theodore H. Moskovitz,et al.  Action prediction error: a value-free dopaminergic teaching signal that drives stable learning , 2024, bioRxiv.

[3]  P. Magill,et al.  Distributional coding of associative learning within projection-defined populations of midbrain dopamine neurons , 2022, bioRxiv.

[4]  Jack W Lindsey,et al.  Action-modulated midbrain dopamine activity arises from distributed control policies , 2022, NeurIPS.

[5]  Laura M. Haetzel,et al.  Choice-selective sequences dominate in cortical relative to thalamic inputs to NAc to support reinforcement learning , 2022, Cell reports.

[6]  Mackenzie W. Mathis,et al.  Striatal dopamine explains novelty-induced behavioral dynamics and individual variability in threat prediction , 2021, Neuron.

[7]  J. Berke,et al.  Striatal dopamine pulses follow a temporal discounting spectrum , 2021 .

[8]  A. Fairhall,et al.  Context-Dependent Representations of Movement in Drosophila Dopaminergic Reinforcement Pathways , 2021, Nature Neuroscience.

[9]  M. Carandini,et al.  Dopamine axons in dorsal striatum encode contralateral visual stimuli and choices , 2021, The Journal of Neuroscience.

[10]  Arif A. Hamid,et al.  Wave-like dopamine dynamics as a mechanism for spatiotemporal credit assignment , 2021, Cell.

[11]  N. Uchida,et al.  Distinct temporal difference error signals in dopamine axons in three regions of the striatum in a decision-making task , 2020, bioRxiv.

[12]  Ilana B. Witten,et al.  A comparison of dopaminergic and cholinergic populations reveals unique contributions of VTA dopamine neurons to short-term memory , 2020, bioRxiv.

[13]  S. Lammel,et al.  Aversion hot spots in the dopamine system , 2020, Current Opinion in Neurobiology.

[14]  Benjamin T. Saunders,et al.  Heterogeneity in striatal dopamine circuits: Form and function in dynamic reward seeking , 2020, Journal of neuroscience research.

[15]  Gregory W. Gundersen,et al.  Distinct signals in medial and lateral VTA dopamine neurons modulate fear extinction at different times , 2020, bioRxiv.

[16]  D. Hassabis,et al.  A distributional code for value in dopamine-based reinforcement learning , 2020, Nature.

[17]  Yves Kremer,et al.  Context-Dependent Multiplexing by Individual VTA Dopamine Neurons , 2019, The Journal of Neuroscience.

[18]  Rafal Bogacz,et al.  Dopamine role in learning and action inference , 2019, bioRxiv.

[19]  HyungGoo R. Kim,et al.  The role of state uncertainty in the dynamics of dopamine , 2019, Current Biology.

[20]  Samuel J. Gershman,et al.  A Unified Framework for Dopamine Signals across Timescales , 2019, Cell.

[21]  Y. Niv Learning task-state representations , 2019, Nature Neuroscience.

[22]  Ashok Litwin-Kumar,et al.  Models of heterogeneous dopamine signaling in an insect learning and memory center , 2019, bioRxiv.

[23]  P. Kaeser,et al.  Mechanisms and regulation of dopamine release , 2019, Current Opinion in Neurobiology.

[24]  L. Wilbrecht,et al.  Imaging striatal dopamine release using a nongenetically encoded near infrared fluorescent catecholamine nanosensor , 2019, Science Advances.

[25]  Ilana B. Witten,et al.  Striatal circuits for reward learning and decision-making , 2019, Nature Reviews Neuroscience.

[26]  Ilana B. Witten,et al.  Specialized coding of sensory, motor, and cognitive variables in VTA dopamine neurons , 2019, Nature.

[27]  Marc G. Bellemare,et al.  Statistics and Samples in Distributional Reinforcement Learning , 2019, ICML.

[28]  Christina K. Kim,et al.  A Neural Circuit Mechanism for Encoding Aversive Stimuli in the Mesolimbic Dopamine System , 2019, Neuron.

[29]  Geoffrey Schoenbaum,et al.  Rethinking dopamine as generalized prediction error , 2018, bioRxiv.

[30]  Ilana B. Witten,et al.  Reward prediction error does not explain movement selectivity in DMS-projecting dopamine neurons , 2018, bioRxiv.

[31]  Luke T. Coddington,et al.  The timing of action determines reward prediction signals in identified midbrain dopamine neurons , 2018, Nature Neuroscience.

[32]  David Robbe To move or to sense? Incorporating somatosensory representation into striatal functions , 2018, Current Opinion in Neurobiology.

[33]  N. Uchida,et al.  Dopamine neurons projecting to the posterior striatum reinforce avoidance of threatening stimuli , 2018, Nature Neuroscience.

[34]  Benjamin T. Saunders,et al.  Dopamine neurons create Pavlovian conditioned stimuli with circuit-defined motivational properties , 2018, Nature Neuroscience.

[35]  P. Kaeser,et al.  Dopamine Secretion Is Mediated by Sparse Active Zone-like Release Sites , 2018, Cell.

[36]  R. Costa,et al.  Dopamine neuron activity before action initiation gates and invigorates future movements , 2018, Nature.

[37]  Ben Deverett,et al.  An Accumulation-of-Evidence Task Using Visual Pulses for Mice Navigating in Virtual Reality , 2017, bioRxiv.

[38]  Joseph J. Paton,et al.  The many worlds hypothesis of dopamine prediction error: implications of a parallel circuit architecture in the basal ganglia , 2017, Current Opinion in Neurobiology.

[39]  Kimberly L. Stachenfeld,et al.  The hippocampus as a predictive map , 2017, Nature Neuroscience.

[40]  Anne E Carpenter,et al.  Reconstructing cell cycle and disease progression using deep learning , 2017, Nature Communications.

[41]  E. Koechlin,et al.  The Importance of Falsification in Computational Cognitive Modeling , 2017, Trends in Cognitive Sciences.

[42]  Adam Kepecs,et al.  Midbrain Dopamine Neurons Signal Belief in Choice Accuracy during a Perceptual Decision , 2017, Current Biology.

[43]  Samuel Gershman,et al.  Predictive representations can link model-based reinforcement learning to model-free mechanisms , 2017, bioRxiv.

[44]  N. Uchida,et al.  Opposite initialization to novel cues in dopamine signaling in ventral and posterior striatum in mice , 2016, eLife.

[45]  G. Schoenbaum,et al.  Ventral striatal lesions disrupt dopamine neuron signaling of differences in cue value caused by changes in reward timing but not number. , 2016, Behavioral neuroscience.

[46]  Tianyi Mao,et al.  A comprehensive excitatory input map of the striatum reveals novel functional organization , 2016, eLife.

[47]  N. Uchida,et al.  Midbrain dopamine neurons signal aversion in a reward-context-dependent manner , 2016, eLife.

[48]  M. Howe,et al.  Rapid signaling in distinct dopaminergic axons during locomotion and reward , 2016, Nature.

[49]  Nicholas N. Foster,et al.  The mouse cortico-striatal projectome , 2016, Nature Neuroscience.

[50]  Tom Schaul,et al.  Successor Features for Transfer in Reinforcement Learning , 2016, NIPS.

[51]  J. Schulman,et al.  OpenAI Gym , 2016, ArXiv.

[52]  P. Dayan,et al.  Safety out of control: dopamine and defence , 2016, Behavioral and Brain Functions.

[53]  Jakob K. Dreyer,et al.  Representation of spontaneous movement by dopaminergic neurons is cell-type selective and disrupted in parkinsonism , 2016, Proceedings of the National Academy of Sciences.

[54]  Ilana B. Witten,et al.  Reward and choice encoding in terminals of midbrain dopamine neurons depends on striatal target , 2016, Nature Neuroscience.

[55]  N. Uchida,et al.  Dopamine neurons share common response function for reward prediction error , 2016, Nature Neuroscience.

[56]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[57]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[58]  R. Awatramani,et al.  Molecular heterogeneity of midbrain dopaminergic neurons – Moving toward single cell resolution , 2015, FEBS letters.

[59]  R. Bogacz,et al.  Action Initiation Shapes Mesolimbic Dopamine Encoding of Future Rewards , 2015, Nature Neuroscience.

[60]  E. Benarroch,et al.  Heterogeneity of the midbrain dopamine system , 2015, Neurology.

[61]  Vaughn L. Hetrick,et al.  Mesolimbic Dopamine Signals the Value of Work , 2015, Nature Neuroscience.

[62]  Y. Niv,et al.  Discovering latent causes in reinforcement learning , 2015, Current Opinion in Behavioral Sciences.

[63]  Talia N. Lerner,et al.  Intact-Brain Analyses Reveal Distinct Information Carried by SNc Dopamine Subcircuits , 2015, Cell.

[64]  M. Rice,et al.  Somatodendritic dopamine release: recent mechanistic insights , 2015, Philosophical Transactions of the Royal Society B: Biological Sciences.

[65]  Joseph W. Barter,et al.  Beyond reward prediction errors: the role of dopamine in movement kinematics , 2015, Front. Integr. Neurosci..

[66]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[67]  P. Rueda-Orozco,et al.  The striatum multiplexes contextual and kinematic information to constrain motor habits execution , 2014, Nature Neuroscience.

[68]  M. Marinelli,et al.  Heterogeneity of dopamine neuron activity across traits and states , 2014, Neuroscience.

[69]  Dmitriy Aronov,et al.  Engagement of Neural Circuits Underlying 2D Spatial Navigation in a Rodent Virtual Reality System , 2014, Neuron.

[70]  Samuel Gershman,et al.  Dopamine Ramps Are a Consequence of Reward Prediction Errors , 2014, Neural Computation.

[71]  S. Lammel,et al.  Reward and aversion in a heterogeneous midbrain dopamine system , 2014, Neuropharmacology.

[72]  Robert C. Wilson,et al.  Orbitofrontal Cortex as a Cognitive Map of Task Space , 2014, Neuron.

[73]  K. Deisseroth,et al.  Input-specific control of reward and aversion in the ventral tegmental area , 2012, Nature.

[74]  M. Frank,et al.  Mechanisms of hierarchical reinforcement learning in corticostriatal circuits 1: computational analysis. , 2012, Cerebral cortex.

[75]  Nathaniel D. Daw,et al.  Grid Cells, Place Cells, and Geodesic Generalization for Spatial Reinforcement Learning , 2011, PLoS Comput. Biol..

[76]  Robert C. Wilson,et al.  Expectancy-related changes in firing of dopamine neurons depend on orbitofrontal cortex , 2011, Nature Neuroscience.

[77]  Joseph T. McGuire,et al.  A Neural Signature of Hierarchical Reinforcement Learning , 2011, Neuron.

[78]  S. Wise,et al.  Frontal pole cortex: encoding ends at the end of the endbrain , 2011, Trends in Cognitive Sciences.

[79]  Tianyi Mao,et al.  Inputs to the Dorsal Striatum of the Mouse Reflect the Parallel Circuit Architecture of the Forebrain , 2010, Front. Neuroanat..

[80]  Xin Jin,et al.  Start/stop signals emerge in nigrostriatal circuits during sequence learning , 2010, Nature.

[81]  M. Botvinick,et al.  Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective , 2009, Cognition.

[82]  N. Daw,et al.  Human Reinforcement Learning Subdivides Structured Action Spaces by Learning Effector-Specific Values , 2009, The Journal of Neuroscience.

[83]  Zeb Kurth-Nelson,et al.  Temporal-Difference Reinforcement Learning with Distributed Representations , 2009, PloS one.

[84]  O. Hikosaka,et al.  Two types of dopamine neuron distinctly convey positive and negative motivational signals , 2009, Nature.

[85]  F. Fujiyama,et al.  Single Nigrostriatal Dopaminergic Neurons Form Widely Spread and Highly Dense Axonal Arborizations in the Neostriatum , 2009, The Journal of Neuroscience.

[86]  A. Graybiel Habits, rituals, and the evaluative brain. , 2008, Annual review of neuroscience.

[87]  J. Wickens,et al.  Space, time and dopamine , 2007, Trends in Neurosciences.

[88]  David S. Touretzky,et al.  Representation and Timing in Theories of the Dopamine System , 2006, Neural Computation.

[89]  David S. Touretzky,et al.  Similarity and Discrimination in Classical Conditioning: A Latent Variable Account , 2004, NIPS.

[90]  T. Robbins,et al.  Putting a spin on the dorsal–ventral divide of the striatum , 2004, Trends in Neurosciences.

[91]  Karl J. Friston,et al.  Dissociable Roles of Ventral and Dorsal Striatum in Instrumental Conditioning , 2004, Science.

[92]  Sham M. Kakade,et al.  Opponent interactions between serotonin and dopamine , 2002, Neural Networks.

[93]  Mitsuo Kawato,et al.  Multiple Model-Based Reinforcement Learning , 2002, Neural Computation.

[94]  J. Wickens,et al.  A cellular mechanism of reward-related learning , 2001, Nature.

[95]  A. Graybiel,et al.  Neurons in the thalamic CM-Pf complex supply striatal neurons with information about behaviorally significant sensory events. , 2001, Journal of neurophysiology.

[96]  Nikolaus R. McFarland,et al.  Striatonigrostriatal Pathways in Primates Form an Ascending Spiral from the Shell to the Dorsolateral Striatum , 2000, The Journal of Neuroscience.

[97]  C. I. Connolly,et al.  Building neural representations of habits. , 1999, Science.

[98]  W. Schultz,et al.  A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task , 1999, Neuroscience.

[99]  R. Rescorla Learning about qualitatively different outcomes during a blocking procedure , 1999 .

[100]  Peter Dayan,et al.  Statistical Models of Conditioning , 1997, NIPS.

[101]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[102]  B. Bloch,et al.  Ultrastructural localization of D1 dopamine receptor immunoreactivity in rat striatonigral neurons and its relation with dopaminergic innervation , 1996, Brain Research.

[103]  P. Dayan,et al.  A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[104]  W. Estes Toward a Statistical Theory of Learning. , 1994 .

[105]  Karl J. Friston,et al.  Value-dependent selection in the brain: Simulation in a synthetic neural model , 1994, Neuroscience.

[106]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[107]  J. Pearce,et al.  Effect of changing the unconditioned stimulus on appetitive blocking. , 1988, Journal of experimental psychology. Animal behavior processes.

[108]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[109]  J. Pearce,et al.  A model for Pavlovian learning: variations in the effectiveness of conditioned but not of unconditioned stimuli. , 1980, Psychological review.

[110]  L. Kamin Attention-like processes in classical conditioning , 1967 .

[111]  R. E. Kalman,et al.  A New Approach to Linear Filtering and Prediction Problems , 2002 .

[112]  E. Rowland Theory of Games and Economic Behavior , 1946, Nature.

[113]  Pablo Tano,et al.  A Local Temporal Difference Code for Distributional Reinforcement Learning , 2020, NeurIPS.

[114]  A. Hall,et al.  Adaptive Switching Circuits , 2016 .

[115]  A. Cooper,et al.  Predictive Reward Signal of Dopamine Neurons , 2011 .

[116]  D. Blei,et al.  Context, learning, and extinction. , 2010, Psychological review.

[117]  Satinder Singh Transfer of learning by composing solutions of elemental sequential tasks , 2004, Machine Learning.

[118]  Ronald,et al.  Learning representations by backpropagating errors , 2004 .

[119]  David S. Touretzky,et al.  Timing and Partial Observability in the Dopamine System , 2002, NIPS.

[120]  F. Gonon,et al.  Geometry and kinetics of dopaminergic transmission in the rat striatum and in mice lacking the dopamine transporter. , 2000, Progress in brain research.

[121]  Joel L. Davis,et al.  A Model of How the Basal Ganglia Generate and Use Neural Signals That Predict Reinforcement , 1994 .

[122]  J. Kruschke,et al.  ALCOVE: an exemplar-based connectionist model of category learning. , 1992, Psychological review.

[123]  G. E. Alexander,et al.  Parallel organization of functionally segregated circuits linking basal ganglia and cortex. , 1986, Annual review of neuroscience.

[124]  R. Rescorla,et al.  A theory of Pavlovian conditioning : Variations in the effectiveness of reinforcement and nonreinforcement , 1972 .

[125]  J. Neumann,et al.  Theory of games and economic behavior, 2nd rev. ed. , 1947 .

[126]  Frontiers in Computational Neuroscience , 2022 .