Neural basis of learning guided by sensory confidence and reward value

Making efficient decisions requires combining present sensory evidence with previous reward values, and learning from the resulting outcome. To establish the underlying neural processes, we trained mice in a task that probed such decisions. Mouse choices conformed to a reinforcement learning model that estimates predicted value (reward value times sensory confidence) and prediction error (outcome minus predicted value). Predicted value was encoded in the pre-outcome activity of prelimbic frontal neurons and midbrain dopamine neurons. Prediction error was encoded in the post-outcome activity of dopamine neurons, which reflected not only reward value but also sensory confidence. Manipulations of these signals spared ongoing choices but profoundly affected subsequent learning. Learning depended on the pre-outcome activity of prelimbic neurons, but not dopamine neurons. Learning also depended on the post-outcome activity of dopamine neurons, but not prelimbic neurons. These results reveal the distinct roles of frontal and dopamine neurons in learning under uncertainty.

[1]  S. Killcross,et al.  Coordination of actions and habits in the medial prefrontal cortex of rats. , 2003, Cerebral cortex.

[2]  Adam Kepecs,et al.  Midbrain Dopamine Neurons Signal Belief in Choice Accuracy during a Perceptual Decision , 2017, Current Biology.

[3]  M. Shadlen,et al.  Representation of Confidence Associated with a Decision by Neurons in the Parietal Cortex , 2009, Science.

[4]  P. Glimcher,et al.  Midbrain Dopamine Neurons Encode a Quantitative Reward Prediction Error Signal , 2005, Neuron.

[5]  Ilana B. Witten,et al.  Reward and choice encoding in terminals of midbrain dopamine neurons depends on striatal target , 2016, Nature Neuroscience.

[6]  Josiah R. Boivin,et al.  A Causal Link Between Prediction Errors, Dopamine Neurons and Learning , 2013, Nature Neuroscience.

[7]  K. Shapiro,et al.  The contingent negative variation (CNV) event-related potential (ERP) predicts the attentional blink , 2008 .

[8]  H. Seo,et al.  Neural basis of reinforcement learning and decision making. , 2012, Annual review of neuroscience.

[9]  Kenneth D. Harris,et al.  High-Yield Methods for Accurate Two-Alternative Visual Psychophysics in Head-Fixed Mice , 2016, bioRxiv.

[10]  S. Mizumori,et al.  Neurons in rat medial prefrontal cortex show anticipatory rate changes to predictable differential rewards in a spatial memory task , 2001, Behavioural Brain Research.

[11]  Timothy E. J. Behrens,et al.  Review Frontal Cortex and Reward-guided Learning and Decision-making Figure 1. Frontal Brain Regions in the Macaque Involved in Reward-guided Learning and Decision-making Finer Grained Anatomical Divisions with Frontal Cortical Systems for Reward-guided Behavior , 2022 .

[12]  Kenneth D Harris,et al.  Spike sorting for large, dense electrode arrays , 2015, Nature Neuroscience.

[13]  B. Balleine,et al.  Lesions of Medial Prefrontal Cortex Disrupt the Acquisition But Not the Expression of Goal-Directed Learning , 2005, The Journal of Neuroscience.

[14]  Wei-Xing Shi,et al.  Behavioral/systems/cognitive Functional Coupling between the Prefrontal Cortex and Dopamine Neurons in the Ventral Tegmental Area , 2022 .

[15]  Kevin J. Miller,et al.  Value representations in the rodent orbitofrontal cortex drive learning, not choice , 2018, bioRxiv.

[16]  Gary Aston-Jones,et al.  Prefrontal neurons encode context-based response execution and inhibition in reward seeking and extinction , 2015, Proceedings of the National Academy of Sciences.

[17]  S. Sesack,et al.  Projections from the Rat Prefrontal Cortex to the Ventral Tegmental Area: Target Specificity in the Synaptic Associations with Mesoaccumbens and Mesocortical Neurons , 2000, The Journal of Neuroscience.

[18]  Vaughn L. Hetrick,et al.  Mesolimbic Dopamine Signals the Value of Work , 2015, Nature Neuroscience.

[19]  Long Ding,et al.  Ongoing, rational calibration of reward-driven perceptual biases , 2018, bioRxiv.

[20]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[21]  K. Doya,et al.  Representation of Action-Specific Reward Values in the Striatum , 2005, Science.

[22]  Raag D. Airan,et al.  Natural Neural Projection Dynamics Underlying Social Behavior , 2014, Cell.

[23]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[24]  K. Doya,et al.  The computational neurobiology of learning and reward , 2006, Current Opinion in Neurobiology.

[25]  Elyssa B. Margolis,et al.  Ventral tegmental area: cellular heterogeneity, connectivity and behaviour , 2017, Nature Reviews Neuroscience.

[26]  Zengcai V. Guo,et al.  Flow of Cortical Activity Underlying a Tactile Decision in Mice , 2014, Neuron.

[27]  M. Roesch,et al.  The Orbitofrontal Cortex and Ventral Tegmental Area Are Necessary for Learning from Unexpected Outcomes , 2009, Neuron.

[28]  J. DiCarlo,et al.  Optogenetic and pharmacological suppression of spatial clusters of face neurons reveal their causal role in face gender discrimination , 2015, Proceedings of the National Academy of Sciences.

[29]  C. Fiorillo,et al.  Optogenetic Mimicry of the Transient Activation of Dopamine Neurons by Natural Reward Is Sufficient for Operant Reinforcement , 2012, PloS one.

[30]  Liqun Luo,et al.  Circuit Architecture of VTA Dopamine Neurons Revealed by Systematic Input-Output Mapping , 2015, Cell.

[31]  James M. Otis,et al.  Prefrontal cortex output circuits guide reward seeking through divergent cue encoding , 2017, Nature.

[32]  Yang Dan,et al.  Cell-Type-Specific Activity in Prefrontal Cortex during Goal-Directed Behavior , 2015, Neuron.

[33]  Hatim A. Zariwala,et al.  Neural correlates, computation and behavioural impact of decision confidence , 2008, Nature.

[34]  L. Wilbrecht,et al.  Transient stimulation of distinct subpopulations of striatal neurons mimics changes in action value , 2012, Nature Neuroscience.

[35]  Christopher Summerfield,et al.  Building Bridges between Perceptual and Economic Decision-Making: Neural and Computational Mechanisms , 2012, Front. Neurosci..

[36]  Shawn R. Olsen,et al.  Gain control by layer six in cortical circuits of vision , 2012, Nature.

[37]  Naoshige Uchida,et al.  Arithmetic and local circuitry underlying dopamine prediction errors , 2015, Nature.

[38]  J. Gold,et al.  The neural basis of decision making. , 2007, Annual review of neuroscience.

[39]  W. Schultz Neuronal Reward and Decision Signals: From Theories to Data. , 2015, Physiological reviews.

[40]  C. Petersen,et al.  Reward-Based Learning Drives Rapid Sensory Signals in Medial Prefrontal Cortex and Dorsal Hippocampus Necessary for Goal-Directed Behavior , 2018, Neuron.

[41]  M. Sahani,et al.  Implicit knowledge of visual uncertainty guides decisions with asymmetric outcomes. , 2008, Journal of vision.

[42]  William R. Stauffer,et al.  Dopamine Neuron-Specific Optogenetic Stimulation in Rhesus Macaques , 2016, Cell.

[43]  Talia N. Lerner,et al.  Intact-Brain Analyses Reveal Distinct Information Carried by SNc Dopamine Subcircuits , 2015, Cell.

[44]  Adam Kepecs,et al.  A computational framework for the study of confidence in humans and animals , 2012, Philosophical Transactions of the Royal Society B: Biological Sciences.

[45]  Anne E Carpenter,et al.  Neuron-type specific signals for reward and punishment in the ventral tegmental area , 2011, Nature.

[46]  Philip Holmes,et al.  Can Monkeys Choose Optimally When Faced with Noisy Stimuli and Unequal Rewards? , 2009, PLoS Comput. Biol..

[47]  S. Gershman,et al.  The Medial Prefrontal Cortex Shapes Dopamine Reward Prediction Errors under State Uncertainty , 2018, Neuron.

[48]  K. Deisseroth,et al.  Phasic Firing in Dopaminergic Neurons Is Sufficient for Behavioral Conditioning , 2009, Science.

[49]  B. Balleine,et al.  The role of prelimbic cortex in instrumental conditioning , 2003, Behavioural Brain Research.

[50]  Il Memming Park,et al.  Encoding and decoding in parietal cortex during sensorimotor decision-making , 2014, Nature Neuroscience.

[51]  Adam Kepecs,et al.  Categorical representations of decision-variables in orbitofrontal cortex , 2017, bioRxiv.

[52]  P. Dayan,et al.  Decision theory, reinforcement learning, and the brain , 2008, Cognitive, affective & behavioral neuroscience.

[53]  W. Newsome,et al.  Choosing the greater of two goods: neural currencies for valuation and decision making , 2005, Nature Reviews Neuroscience.

[54]  Yutaka Komura,et al.  Responses of pulvinar neurons reflect a subject's confidence in visual categorization , 2013, Nature Neuroscience.

[55]  G. Buzsáki,et al.  A 4 Hz Oscillation Adaptively Synchronizes Prefrontal, VTA, and Hippocampal Activities , 2011, Neuron.