Temporal dynamics of prediction error processing during reward-based decision making

Adaptive decision making depends on the accurate representation of rewards associated with potential choices. These representations can be acquired with reinforcement learning (RL) mechanisms, which use the prediction error (PE, the difference between expected and received rewards) as a learning signal to update reward expectations. While EEG experiments have highlighted the role of feedback-related potentials during performance monitoring, important questions about the temporal sequence of feedback processing and the specific function of feedback-related potentials during reward-based decision making remain. Here, we hypothesized that feedback processing starts with a qualitative evaluation of outcome-valence, which is subsequently complemented by a quantitative representation of PE magnitude. Results of a model-based single-trial analysis of EEG data collected during a reversal learning task showed that around 220ms after feedback outcomes are initially evaluated categorically with respect to their valence (positive vs. negative). Around 300ms, and parallel to the maintained valence-evaluation, the brain also represents quantitative information about PE magnitude, thus providing the complete information needed to update reward expectations and to guide adaptive decision making. Importantly, our single-trial EEG analysis based on PEs from an RL model showed that the feedback-related potentials do not merely reflect error awareness, but rather quantitative information crucial for learning reward contingencies.

[1]  James F. Cavanagh,et al.  Frontal theta links prediction errors to behavioral adaptation in reinforcement learning , 2010, NeuroImage.

[2]  R. Rescorla,et al.  A theory of Pavlovian conditioning : Variations in the effectiveness of reinforcement and nonreinforcement , 1972 .

[3]  R. Ratcliff,et al.  Neural Representation of Task Difficulty and Decision Making during Perceptual Categorization: A Timing Diagram , 2006, The Journal of Neuroscience.

[4]  Michael X. Cohen,et al.  Reward expectation modulates feedback-related negativity and EEG spectra , 2007, NeuroImage.

[5]  Hisao Nishijo,et al.  Amygdala role in conditioned associative learning , 1995, Progress in Neurobiology.

[6]  P. Overton,et al.  Burst firing in midbrain dopaminergic neurons , 1997, Brain Research Reviews.

[7]  Keiji Tanaka,et al.  Medial prefrontal cell activity signaling prediction errors of action values , 2007, Nature Neuroscience.

[8]  Atsushi Sato,et al.  Effects of value and reward magnitude on feedback negativity and P300 , 2005, Neuroreport.

[9]  D. Durstewitz,et al.  The ability of the mesocortical dopamine system to operate in distinct temporal modes , 2007, Psychopharmacology.

[10]  Andrew Caplin,et al.  Axiomatic Methods, Dopamine and Reward Prediction Error This Review Comes from a Themed Issue on Cognitive Neuroscience Edited Advantages of the Axiomatic Approach , 2022 .

[11]  Clay B. Holroyd,et al.  Dorsal anterior cingulate cortex integrates reinforcement history to guide voluntary behavior , 2008, Cortex.

[12]  Richard S. Sutton,et al.  Reinforcement Learning , 1992, Handbook of Machine Learning.

[13]  P. Sajda,et al.  EEG-Informed fMRI Reveals Spatiotemporal Characteristics of Perceptual Decision Making , 2007, The Journal of Neuroscience.

[14]  P. Sajda,et al.  Temporal characterization of the neural correlates of perceptual decision making in the human brain. , 2006, Cerebral cortex.

[15]  P. Sajda,et al.  Response error correction-a demonstration of improved human-machine performance using real-time EEG monitoring , 2003, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[16]  E Donchin,et al.  On how P300 amplitude varies with the utility of the eliciting stimuli. , 1978, Electroencephalography and clinical neurophysiology.

[17]  Barak A. Pearlmutter,et al.  Linear Spatial Integration for Single-Trial Detection in Encephalography , 2002, NeuroImage.

[18]  W. F. Prokasy,et al.  Classical conditioning II: Current research and theory. , 1972 .

[19]  Clay B. Holroyd,et al.  The feedback-related negativity reflects the binary evaluation of good versus bad outcomes , 2006, Biological Psychology.

[20]  Clay B. Holroyd,et al.  ERP correlates of feedback and reward processing in the presence and absence of response choice. , 2005, Cerebral cortex.

[21]  J. Polich Updating P300: An integrative theory of P3a and P3b , 2007, Clinical Neurophysiology.

[22]  R. Baker,et al.  When is an error not a prediction error? An electrophysiological investigation , 2009, Cognitive, affective & behavioral neuroscience.

[23]  Clay B. Holroyd,et al.  It's worse than you thought: the feedback negativity and violations of reward prediction in gambling tasks. , 2007, Psychophysiology.

[24]  P. Redgrave,et al.  Is the short-latency dopamine response too short to signal reward error? , 1999, Trends in Neurosciences.

[25]  W. Schultz,et al.  Reward-related neuronal activity during go-nogo task performance in primate orbitofrontal cortex. , 2000, Journal of neurophysiology.

[26]  Paul Sajda,et al.  Quality of evidence for perceptual decision making is indexed by trial-to-trial variability of the EEG , 2009, Proceedings of the National Academy of Sciences.

[27]  Werner Sommer,et al.  The expectancies that govern the P300 amplitude are mostly automatic and unconscious , 1998, Behavioral and Brain Sciences.

[28]  J. O'Doherty,et al.  Dissociating Valence of Outcome from Behavioral Control in Human Orbital and Ventral Prefrontal Cortices , 2003, The Journal of Neuroscience.

[29]  P. Rubé,et al.  L’examen Clinique en Psychologie , 1959 .

[30]  P. Dayan,et al.  Reinforcement learning: The Good, The Bad and The Ugly , 2008, Current Opinion in Neurobiology.

[31]  Ethan R. Buch,et al.  Comparison of population activity in the dorsal premotor cortex and putamen during the learning of arbitrary visuomotor mappings , 2006, Experimental Brain Research.

[32]  A. Sanfey,et al.  Independent Coding of Reward Magnitude and Valence in the Human Brain , 2004, The Journal of Neuroscience.

[33]  Y. Niv,et al.  Dialogues on prediction errors , 2008, Trends in Cognitive Sciences.

[34]  J. Polich Updating P 300 : An Integrative Theory of P 3 a and P 3 b , 2009 .

[35]  J. Gläscher,et al.  Dissociable Systems for Gain- and Loss-Related Value Predictions and Errors of Prediction in the Human Brain , 2006, The Journal of Neuroscience.

[36]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[37]  Michael I. Jordan,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1994, Neural Computation.

[38]  W. Schultz Behavioral theories and the neurophysiology of reward. , 2006, Annual review of psychology.

[39]  William J. Tays,et al.  Aging and electrocortical response to error feedback during a spatial learning task. , 2008, Psychophysiology.

[40]  Michael X. Cohen,et al.  Behavioral / Systems / Cognitive Reinforcement Learning Signals Predict Future Decisions , 2007 .

[41]  Tim Curran,et al.  Cross-task individual differences in error processing: Neural, electrophysiological, and genetic components , 2007, Cognitive, affective & behavioral neuroscience.

[42]  A. Phillips,et al.  Amygdalar control of the mesocorticolimbic dopamine system: parallel pathways to motivated behavior , 2003, Neuroscience & Biobehavioral Reviews.

[43]  W. Schultz,et al.  Responses to reward in monkey dorsal and ventral striatum , 2004, Experimental Brain Research.

[44]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[45]  E Donchin,et al.  Event-related brain potentials and subjective probability in a learning task , 1980, Memory & cognition.

[46]  B. Balleine,et al.  The Role of the Dorsal Striatum in Reward and Decision-Making , 2007, The Journal of Neuroscience.

[47]  Michael J. Frank,et al.  Error-Related Negativity Predicts Reinforcement Learning and Conflict Biases , 2005, Neuron.

[48]  Peter N. C. Mohr,et al.  Genetic variation in dopaminergic neuromodulation influences the ability to rapidly and flexibly adapt decisions , 2009, Proceedings of the National Academy of Sciences.

[49]  E. Murray,et al.  The amygdala and reward , 2002, Nature Reviews Neuroscience.

[50]  Clay B. Holroyd,et al.  The neural basis of human error processing: reinforcement learning, dopamine, and the error-related negativity. , 2002, Psychological review.

[51]  Lucas C. Parra,et al.  Recipes for the linear analysis of EEG , 2005, NeuroImage.

[52]  Jonathan D. Cohen,et al.  Decision making, the P3, and the locus coeruleus-norepinephrine system. , 2005, Psychological bulletin.

[53]  V S Johnston,et al.  Probability learning and the P3 component of the visual evoked potential in man. , 1980, Psychophysiology.

[54]  B. Eppinger,et al.  Better or worse than expected? Aging, learning, and the ERN , 2008, Neuropsychologia.

[55]  Adrian R. Willoughby,et al.  The Medial Frontal Cortex and the Rapid Processing of Monetary Gains and Losses , 2002, Science.

[56]  Clay B. Holroyd,et al.  Brain potentials associated with expected and unexpected good and bad outcomes. , 2005, Psychophysiology.

[57]  I. Daum,et al.  Learning‐related changes in reward expectancy are reflected in the feedback‐related negativity , 2008, The European journal of neuroscience.

[58]  Tomifusa Kuboki,et al.  Error-related negativity reflects detection of negative reward prediction error , 2004, Neuroreport.

[59]  Emanuel Donchin,et al.  Context updating and the P300 , 1998, Behavioral and Brain Sciences.

[60]  Timothy E. J. Behrens,et al.  Learning the value of information in an uncertain world , 2007, Nature Neuroscience.

[61]  W. Schultz,et al.  Dopamine neurons of the monkey midbrain: contingencies of responses to stimuli eliciting immediate behavioral reactions. , 1990, Journal of neurophysiology.