Belief state representation in the dopamine system

Learning to predict future outcomes is critical for driving appropriate behaviors. Reinforcement learning (RL) models have successfully accounted for such learning, relying on reward prediction errors (RPEs) signaled by midbrain dopamine neurons. It has been proposed that when sensory data provide only ambiguous information about which state an animal is in, it can predict reward based on a set of probabilities assigned to hypothetical states (called the belief state). Here we examine how dopamine RPEs and subsequent learning are regulated under state uncertainty. Mice are first trained in a task with two potential states defined by different reward amounts. During testing, intermediate-sized rewards are given in rare trials. Dopamine activity is a non-monotonic function of reward size, consistent with RL models operating on belief states. Furthermore, the magnitude of dopamine responses quantitatively predicts changes in behavior. These results establish the critical role of state inference in RL.Dopamine neurons encode reward prediction errors (RPE) that report the mismatch between expected reward and outcome for a given state. Here the authors report that when there is uncertainty about the current state, RPEs are calculated on the probabilistic representation of the current state or belief state.

[1]  A. Ogura,et al.  A single optical fiber fluorometric device for measurement of intracellular Ca2+ concentration: Its application to hippocampal neurons in vitro and in vivo , 1992, Neuroscience.

[2]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[3]  M. Mishkin,et al.  Effects of orbital frontal and anterior cingulate lesions on object and spatial memory in rhesus monkeys , 1997, Neuropsychologia.

[4]  Z. Mainen,et al.  Speed and accuracy of olfactory discrimination in the rat , 2003, Nature Neuroscience.

[5]  E. Murray,et al.  Bilateral Orbital Prefrontal Cortex Lesions in Rhesus Monkeys Disrupt Choices Guided by Both Reward Value and Reward Contingency , 2004, The Journal of Neuroscience.

[6]  W. Schultz,et al.  Adaptive Coding of Reward Value by Dopamine Neurons , 2005, Science.

[7]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[8]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[9]  P. Glimcher,et al.  Midbrain Dopamine Neurons Encode a Quantitative Reward Prediction Error Signal , 2005, Neuron.

[10]  Michael E. Ragozzino,et al.  The involvement of the orbitofrontal cortex in learning under changing task contingencies , 2005, Neurobiology of Learning and Memory.

[11]  David S. Touretzky,et al.  Representation and Timing in Theories of the Dopamine System , 2006, Neural Computation.

[12]  B. Hoffer,et al.  Characterization of a mouse strain expressing Cre recombinase from the 3′ untranslated region of the dopamine transporter locus , 2006, Genesis.

[13]  Aaron C. Courville,et al.  Bayesian theories of conditioning in a changing world , 2006, Trends in Cognitive Sciences.

[14]  J. Pearce,et al.  Structural learning and the hippocampus , 2007, Hippocampus.

[15]  B. Bean,et al.  Roles of Subthreshold Calcium Current and Sodium Current in Spontaneous Firing of Mouse Midbrain Dopamine Neurons , 2007, The Journal of Neuroscience.

[16]  David S. Touretzky,et al.  Context Learning in the Rodent Hippocampus , 2007, Neural Computation.

[17]  W. Newsome,et al.  The temporal precision of reward prediction in dopamine neurons , 2008, Nature Neuroscience.

[18]  P. Dayan,et al.  Decision theory, reinforcement learning, and the brain , 2008, Cognitive, affective & behavioral neuroscience.

[19]  R. Wightman,et al.  Real-time chemical responses in the nucleus accumbens differentiate rewarding and aversive stimuli , 2008, Nature Neuroscience.

[20]  W. Schultz,et al.  Influence of Reward Delays on Responses of Dopamine Neurons , 2008, The Journal of Neuroscience.

[21]  Karl J. Friston,et al.  Bayesian model selection for group studies , 2009, NeuroImage.

[22]  O. Hikosaka,et al.  Two types of dopamine neuron distinctly convey positive and negative motivational signals , 2009, Nature.

[23]  D. Blei,et al.  Context, learning, and extinction. , 2010, Psychological review.

[24]  Rajesh P. N. Rao,et al.  Decision Making Under Uncertainty: A Neural Model Based on Partially Observable Markov Decision Processes , 2010, Front. Comput. Neurosci..

[25]  Simon Hong,et al.  A pallidus-habenula-dopamine pathway signals inferred stimulus values. , 2010, Journal of neurophysiology.

[26]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[27]  Anne E Carpenter,et al.  Neuron-type specific signals for reward and punishment in the ventral tegmental area , 2011, Nature.

[28]  K. Deisseroth,et al.  Striatal Dopamine Release Is Triggered by Synchronized Activity in Cholinergic Interneurons , 2012, Neuron.

[29]  Jasper Akerboom,et al.  Optimization of a GCaMP Calcium Indicator for Neural Activity Imaging , 2012, The Journal of Neuroscience.

[30]  Joshua L. Jones,et al.  Orbitofrontal Cortex Supports Behavior and Learning Using Inferred But Not Cached Values , 2012, Science.

[31]  D. Lovinger,et al.  Selective activation of cholinergic interneurons enhances accumbal phasic dopamine release: setting the tone for reward processing. , 2012, Cell reports.

[32]  Steven S. Vogel,et al.  Concurrent Activation of Striatal Direct and Indirect Pathways During Action Initiation , 2013, Nature.

[33]  Stefan R. Pulver,et al.  Ultra-sensitive fluorescent proteins for imaging neuronal activity , 2013, Nature.

[34]  Karl J. Friston,et al.  Bayesian model selection for group studies — Revisited , 2014, NeuroImage.

[35]  P. Glimcher,et al.  Phasic Dopamine Release in the Rat Nucleus Accumbens Symmetrically Encodes a Reward Prediction Error Term , 2014, The Journal of Neuroscience.

[36]  Robert C. Wilson,et al.  Orbitofrontal Cortex as a Cognitive Map of Task Space , 2014, Neuron.

[37]  William R. Stauffer,et al.  Dopamine Reward Prediction Error Responses Reflect Marginal Utility , 2014, Current Biology.

[38]  Samuel Gershman,et al.  Statistical Computations Underlying the Dynamics of Memory Updating , 2014, PLoS Comput. Biol..

[39]  Raag D. Airan,et al.  Natural Neural Projection Dynamics Underlying Social Behavior , 2014, Cell.

[40]  Talia N. Lerner,et al.  Intact-Brain Analyses Reveal Distinct Information Carried by SNc Dopamine Subcircuits , 2015, Cell.

[41]  Naoshige Uchida,et al.  Arithmetic and local circuitry underlying dopamine prediction errors , 2015, Nature.

[42]  Y. Niv,et al.  Temporal Specificity of Reward Prediction Errors Signaled by Putative Dopamine Neurons in Rat VTA Depends on Ventral Striatum , 2016, Neuron.

[43]  Ilana B. Witten,et al.  Reward and choice encoding in terminals of midbrain dopamine neurons depends on striatal target , 2016, Nature Neuroscience.

[44]  G. Schoenbaum,et al.  Cholinergic Interneurons Use Orbitofrontal Input to Track Beliefs about Current State , 2016, The Journal of Neuroscience.

[45]  S. Ostlund,et al.  Nucleus Accumbens Acetylcholine Receptors Modulate Dopamine and Motivation , 2016, Neuropsychopharmacology.

[46]  Geoffrey Schoenbaum,et al.  Midbrain dopamine neurons compute inferred and cached value prediction errors in a common framework , 2016, eLife.

[47]  N. Uchida,et al.  Dopamine neurons share common response function for reward prediction error , 2016, Nature Neuroscience.

[48]  S. Gershman,et al.  Dopamine reward prediction errors reflect hidden state inference across time , 2017, Nature Neuroscience.

[49]  N. Uchida,et al.  Neural Circuitry of Reward Prediction Error. , 2017, Annual review of neuroscience.

[50]  Z. Mainen,et al.  Activity patterns of serotonin neurons underlying cognitive flexibility , 2017, eLife.

[51]  Adam Kepecs,et al.  Midbrain Dopamine Neurons Signal Belief in Choice Accuracy during a Perceptual Decision , 2017, Current Biology.

[52]  N. Uchida,et al.  Opposite initialization to novel cues in dopamine signaling in ventral and posterior striatum in mice , 2016, eLife.

[53]  Hannah M. Batchelor,et al.  Dopamine Neurons Respond to Errors in the Prediction of Sensory Features of Expected Rewards , 2017, Neuron.

[54]  D. López-Barroso,et al.  Unraveling the Role of the Hippocampus in Reversal Learning , 2017, The Journal of Neuroscience.

[55]  N. Parga,et al.  Dopamine reward prediction error signal codes the temporal evaluation of a perceptual decision report , 2017, Proceedings of the National Academy of Sciences.

[56]  W. Schultz Recent advances in understanding the role of phasic dopamine activity , 2019, F1000Research.

[57]  Ilya E. Monosov,et al.  Novelty, Salience, and Surprise Timing Are Signaled by Neurons in the Basal Forebrain , 2019, Current Biology.

[58]  M. Diefenbach,et al.  The Relationship Between Uncertainty and Affect , 2019, Front. Psychol..

[59]  Peter Dayan,et al.  Retrospective model-based inference guides model-free credit assignment , 2019, Nature Communications.