Spatiotemporal neural characterization of prediction error valence and surprise during reward learning in humans

Reward learning depends on accurate reward associations with potential choices. These associations can be attained with reinforcement learning mechanisms using a reward prediction error (RPE) signal (the difference between actual and expected rewards) for updating future reward expectations. Despite an extensive body of literature on the influence of RPE on learning, little has been done to investigate the potentially separate contributions of RPE valence (positive or negative) and surprise (absolute degree of deviation from expectations). Here, we coupled single-trial electroencephalography with simultaneously acquired fMRI, during a probabilistic reversal-learning task, to offer evidence of temporally overlapping but largely distinct spatial representations of RPE valence and surprise. Electrophysiological variability in RPE valence correlated with activity in regions of the human reward network promoting approach or avoidance learning. Electrophysiological variability in RPE surprise correlated primarily with activity in regions of the human attentional network controlling the speed of learning. Crucially, despite the largely separate spatial extend of these representations our EEG-informed fMRI approach uniquely revealed a linear superposition of the two RPE components in a smaller network encompassing visuo-mnemonic and reward areas. Activity in this network was further predictive of stimulus value updating indicating a comparable contribution of both signals to reward learning.

[1]  J. O'Doherty,et al.  Dissociating Valence of Outcome from Behavioral Control in Human Orbital and Ventral Prefrontal Cortices , 2003, The Journal of Neuroscience.

[2]  Nathan Intrator,et al.  Limbic Activity Modulation Guided by Functional Magnetic Resonance Imaging–Inspired Electroencephalography Improves Implicit Emotion Regulation , 2016, Biological Psychiatry.

[3]  W. Newsome,et al.  Choosing the greater of two goods: neural currencies for valuation and decision making , 2005, Nature Reviews Neuroscience.

[4]  Juliana Yordanova,et al.  Simultaneous EEG and fMRI Reveals a Causally Connected Subcortical-Cortical Network during Reward Anticipation , 2013, The Journal of Neuroscience.

[5]  J. Pearce,et al.  A model for Pavlovian learning: variations in the effectiveness of conditioned but not of unconditioned stimuli. , 1980, Psychological review.

[6]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[7]  Timothy E. J. Behrens,et al.  Learning the value of information in an uncertain world , 2007, Nature Neuroscience.

[8]  Floris P. de Lange,et al.  How Prediction Errors Shape Perception, Attention, and Motivation , 2012, Front. Psychology.

[9]  Klaus Wunderlich,et al.  Neural computations underlying action-based decision making in the human brain , 2009, Proceedings of the National Academy of Sciences.

[10]  S. Kapur,et al.  Separate brain regions code for salience vs. valence during reward prediction in humans , 2007, Human brain mapping.

[11]  Joshua W. Brown,et al.  Neural Mechanisms of Credit Assignment in a Multicue Environment , 2016, The Journal of Neuroscience.

[12]  J. Dreher,et al.  Cerebral correlates of salient prediction error for different rewards and punishments. , 2013, Cerebral cortex.

[13]  E. Rolls,et al.  Abstract reward and punishment representations in the human orbitofrontal cortex , 2001, Nature Neuroscience.

[14]  P. Dayan,et al.  Reinforcement learning: The Good, The Bad and The Ugly , 2008, Current Opinion in Neurobiology.

[15]  Guillem R. Esber,et al.  Surprise! Neural correlates of Pearce–Hall and Rescorla–Wagner coexist within the brain , 2012, The European journal of neuroscience.

[16]  Peter N. C. Mohr,et al.  Genetic variation in dopaminergic neuromodulation influences the ability to rapidly and flexibly adapt decisions , 2009, Proceedings of the National Academy of Sciences.

[17]  P. Sajda,et al.  Human Scalp Potentials Reflect a Mixture of Decision-Related Signals during Perceptual Choices , 2014, The Journal of Neuroscience.

[18]  P. Dayan,et al.  A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[19]  Robert C. Wilson,et al.  Reinforcement Learning in Multidimensional Environments Relies on Attention Mechanisms , 2015, The Journal of Neuroscience.

[20]  E. Eskandar,et al.  Encoding of Both Positive and Negative Reward Prediction Errors by Neurons of the Primate Lateral Prefrontal Cortex and Caudate Nucleus , 2011, The Journal of Neuroscience.

[21]  Christopher J. Mitchell,et al.  Attention and Associative Learning: From Brain to Behaviour , 2010 .

[22]  Hauke R. Heekeren,et al.  Temporal dynamics of prediction error processing during reward-based decision making , 2010, NeuroImage.

[23]  Marios G. Philiastides,et al.  Neural representations of confidence emerge from the process of decision formation during perceptual choices , 2015, NeuroImage.

[24]  Marco K. Wittmann,et al.  Multiple Neural Mechanisms of Decision Making and Their Competition under Changing Risk Pressure , 2014, Neuron.

[25]  C. Mathys,et al.  Hierarchical Prediction Errors in Midbrain and Basal Forebrain during Sensory Learning , 2013, Neuron.

[26]  W. Schultz,et al.  Discrete Coding of Reward Probability and Uncertainty by Dopamine Neurons , 2003, Science.

[27]  P. Sajda,et al.  EEG-Informed fMRI Reveals Spatiotemporal Characteristics of Perceptual Decision Making , 2007, The Journal of Neuroscience.

[28]  Thomas D. Sambrook,et al.  Mediofrontal event-related potentials in response to positive, negative and unsigned prediction errors , 2014, Neuropsychologia.

[29]  A. Graybiel,et al.  Neurons in the Ventral Striatum Exhibit Cell-Type-Specific Representations of Outcome during Learning , 2014, Neuron.

[30]  O. Hikosaka,et al.  Two types of dopamine neuron distinctly convey positive and negative motivational signals , 2009, Nature.

[31]  Antonio Rangel,et al.  The Decision Value Computations in the vmPFC and Striatum Use a Relative Value Code That is Guided by Visual Attention , 2011, The Journal of Neuroscience.

[32]  Matthew F.S. Rushworth,et al.  Contrasting Roles for Orbitofrontal Cortex and Amygdala in Credit Assignment and Learning in Macaques , 2015, Neuron.

[33]  P. Phillips,et al.  Subsecond dopamine fluctuations in human striatum encode superposed error signals about actual and counterfactual reward , 2015, Proceedings of the National Academy of Sciences.

[34]  Joseph T. McGuire,et al.  Functionally Dissociable Influences on Learning Rate in a Dynamic Environment , 2014, Neuron.

[35]  Hans Knutsson,et al.  Cluster failure: Why fMRI inferences for spatial extent have inflated false-positive rates , 2016, Proceedings of the National Academy of Sciences.

[36]  Anne G E Collins,et al.  Surprise! Dopamine signals mix action, value and error , 2015, Nature Neuroscience.

[37]  Richard Bowtell,et al.  Best current practice for obtaining high quality EEG data during simultaneous FMRI. , 2013, Journal of visualized experiments : JoVE.

[38]  P. Tobler,et al.  Salience Signals in the Right Temporoparietal Junction Facilitate Value-Based Decisions , 2013, The Journal of Neuroscience.

[39]  L. Nystrom,et al.  Tracking the hemodynamic responses to reward and punishment in the striatum. , 2000, Journal of neurophysiology.

[40]  Angela J. Yu,et al.  Uncertainty, Neuromodulation, and Attention , 2005, Neuron.

[41]  Mark W. Woolrich,et al.  Advances in functional and structural MR image analysis and implementation as FSL , 2004, NeuroImage.

[42]  Karl J. Friston,et al.  Behavioral / Systems / Cognitive Striatal Prediction Error Modulates Cortical Coupling , 2010 .

[43]  Thomas E. Nichols,et al.  Optimization of experimental design in fMRI: a general framework using a genetic algorithm , 2003, NeuroImage.

[44]  M. Khamassi,et al.  Contextual modulation of value signals in reward and punishment learning , 2015, Nature Communications.

[45]  K. Preuschoff,et al.  Neural Correlates of Anticipation Risk Reflect Risk Preferences , 2012, The Journal of Neuroscience.

[46]  Timothy E. J. Behrens,et al.  Review Frontal Cortex and Reward-guided Learning and Decision-making Figure 1. Frontal Brain Regions in the Macaque Involved in Reward-guided Learning and Decision-making Finer Grained Anatomical Divisions with Frontal Cortical Systems for Reward-guided Behavior , 2022 .

[47]  P. Sajda,et al.  Simultaneous EEG-fMRI Reveals Temporal Evolution of Coupling between Supramodal Cortical Attention Networks and the Brainstem , 2013, The Journal of Neuroscience.

[48]  Karen J. Mullinger,et al.  Reducing the gradient artefact in simultaneous EEG-fMRI by adjusting the subject's axial position , 2011, NeuroImage.

[49]  Keiji Tanaka,et al.  Medial prefrontal cell activity signaling prediction errors of action values , 2007, Nature Neuroscience.

[50]  A. Rangel,et al.  Dissociating valuation and saliency signals during decision-making. , 2011, Cerebral cortex.

[51]  P. Dayan,et al.  States versus Rewards: Dissociable Neural Prediction Error Signals Underlying Model-Based and Model-Free Reinforcement Learning , 2010, Neuron.

[52]  Angela J. Yu,et al.  Bayesian Prediction and Evaluation in the Anterior Cingulate Cortex , 2013, The Journal of Neuroscience.

[53]  Michael J. Frank,et al.  Dynamic Dopamine Modulation in the Basal Ganglia: A Neurocomputational Account of Cognitive Deficits in Medicated and Nonmedicated Parkinsonism , 2005, Journal of Cognitive Neuroscience.

[54]  J. Gläscher,et al.  Determining a role for ventromedial prefrontal cortex in encoding action-based value signals during reward-related decision making. , 2009, Cerebral cortex.

[55]  J. Kruschke Toward a unified model of attention in associative learning , 2001 .

[56]  D. Shohamy,et al.  A Role for the Medial Temporal Lobe in Feedback-Driven Learning: Evidence from Amnesia , 2013, The Journal of Neuroscience.

[57]  Brian Knutson,et al.  Anticipation of Increasing Monetary Reward Selectively Recruits Nucleus Accumbens , 2001, The Journal of Neuroscience.

[58]  Wolfram Schultz,et al.  Dopamine reward prediction-error signalling: a two-component response , 2016, Nature Reviews Neuroscience.

[59]  Y. Niv,et al.  Dissociable effects of surprising rewards on learning and memory , 2017, bioRxiv.

[60]  Karl J. Friston,et al.  Stochastic Designs in Event-Related fMRI , 1999, NeuroImage.

[61]  Samuel M. McClure,et al.  BOLD Responses Reflecting Dopaminergic Signals in the Human Ventral Tegmental Area , 2008, Science.

[62]  Lucas C. Parra,et al.  Recipes for the linear analysis of EEG , 2005, NeuroImage.

[63]  P. Glimcher,et al.  Testing the Reward Prediction Error Hypothesis with an Axiomatic Model , 2010, The Journal of Neuroscience.

[64]  P. Sajda,et al.  Temporal characterization of the neural correlates of perceptual decision making in the human brain. , 2006, Cerebral cortex.

[65]  P. Glimcher,et al.  Activity in Posterior Parietal Cortex Is Correlated with the Relative Subjective Desirability of Action , 2004, Neuron.

[66]  C. Fiorillo Two Dimensions of Value: Dopamine Neurons Represent Reward But Not Aversiveness , 2013, Science.

[67]  R. Dolan,et al.  The Known Unknowns: Neural Representation of Second-Order Uncertainty, and Ambiguity , 2011, The Journal of Neuroscience.

[68]  P. Glimcher,et al.  MEASURING BELIEFS AND REWARDS: A NEUROECONOMIC APPROACH. , 2010, The quarterly journal of economics.

[69]  G. Pagnoni,et al.  Human Striatal Response to Salient Nonrewarding Stimuli , 2003, The Journal of Neuroscience.

[70]  Hongkeun Kim,et al.  Trusting Our Memories: Dissociating the Neural Correlates of Confidence in Veridical versus Illusory Memories , 2007, The Journal of Neuroscience.

[71]  David Friedman,et al.  Single-trial discrimination for integrating simultaneous EEG and fMRI: Identifying cortical areas contributing to trial-to-trial variability in the auditory oddball task , 2009, NeuroImage.

[72]  P. Dayan,et al.  Cortical substrates for exploratory decisions in humans , 2006, Nature.

[73]  P. Dayan,et al.  Differential Encoding of Losses and Gains in the Human Striatum , 2007, The Journal of Neuroscience.

[74]  Peter Bossaerts,et al.  Risk and risk prediction error signals in anterior insula , 2010, Brain Structure and Function.

[75]  Marios G Philiastides,et al.  A mechanistic account of value computation in the human brain , 2010, Proceedings of the National Academy of Sciences.

[76]  S. Kakade,et al.  Learning and selective attention , 2000, Nature Neuroscience.

[77]  E. Fehr,et al.  The neurobiology of rewards and values in social decision making , 2014, Nature Reviews Neuroscience.

[78]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[79]  Jesper Andersson,et al.  Valid conjunction inference with the minimum statistic , 2005, NeuroImage.

[80]  S. Inati,et al.  An fMRI study of reward-related probability learning , 2005, NeuroImage.

[81]  M. Philiastides,et al.  TITLE : Two spatiotemporally distinct value systems shape reward-based learning in the human brain , 2015 .

[82]  R. Dolan,et al.  No unified reward prediction error in local field potentials from the human nucleus accumbens: evidence from epilepsy patients , 2015, Journal of neurophysiology.

[83]  S. Quartz,et al.  Neural Differentiation of Expected Reward and Risk in Human Subcortical Structures , 2006, Neuron.

[84]  S. Ikemoto Dopamine reward circuitry: Two projection systems from the ventral midbrain to the nucleus accumbens–olfactory tubercle complex , 2007, Brain Research Reviews.

[85]  John M. Pearson,et al.  Surprise Signals in Anterior Cingulate Cortex: Neuronal Encoding of Unsigned Reward Prediction Errors Driving Adjustment in Behavior , 2011, The Journal of Neuroscience.