Task Learnability Modulates Surprise but Not Valence Processing for Reinforcement Learning in Probabilistic Choice Tasks

Abstract The goal of temporal difference (TD) reinforcement learning is to maximize outcomes and improve future decision-making. It does so by utilizing a prediction error (PE), which quantifies the difference between the expected and the obtained outcome. In gambling tasks, however, decision-making cannot be improved because of the lack of learnability. On the basis of the idea that TD utilizes two independent bits of information from the PE (valence and surprise), we asked which of these aspects is affected when a task is not learnable. We contrasted behavioral data and ERPs in a learning variant and a gambling variant of a simple two-armed bandit task, in which outcome sequences were matched across tasks. Participants were explicitly informed that feedback could be used to improve performance in the learning task but not in the gambling task, and we predicted a corresponding modulation of the aspects of the PE. We used a model-based analysis of ERP data to extract the neural footprints of the valence and surprise information in the two tasks. Our results revealed that task learnability modulates reinforcement learning via the suppression of surprise processing but leaves the processing of valence unaffected. On the basis of our model and the data, we propose that task learnability can selectively suppress TD learning as well as alter behavioral adaptation based on a flexible cost–benefit arbitration.

[1]  John P O'Doherty,et al.  Model-based approaches to neuroimaging: combining reinforcement learning theory with fMRI data. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[2]  R. Baker,et al.  When is an error not a prediction error? An electrophysiological investigation , 2009, Cognitive, affective & behavioral neuroscience.

[3]  Sander Nieuwenhuis,et al.  Noradrenergic and Cholinergic Modulation of Belief Updating , 2018, Journal of Cognitive Neuroscience.

[4]  Samuel M. McClure,et al.  BOLD Responses Reflecting Dopaminergic Signals in the Human Ventral Tegmental Area , 2008, Science.

[5]  B. Balleine,et al.  The Role of Learning in the Operation of Motivational Systems , 2002 .

[6]  E. Wagenmakers,et al.  Absolute performance of reinforcement-learning models for the Iowa Gambling Task , 2014 .

[7]  Nick Yeung,et al.  Adaptive behaviour and feedback processing integrate experience and instruction in reinforcement learning , 2017, NeuroImage.

[8]  Clay B. Holroyd,et al.  The feedback correct-related positivity: sensitivity of the event-related brain potential to unexpected positive feedback. , 2008, Psychophysiology.

[9]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[10]  Matthew R Nassar,et al.  Taming the beast: extracting generalizable knowledge from computational models of cognition , 2016, Current Opinion in Behavioral Sciences.

[11]  R. Oostenveld,et al.  Nonparametric statistical testing of EEG- and MEG-data , 2007, Journal of Neuroscience Methods.

[12]  Edward M Bernat,et al.  Time-frequency theta and delta measures index separable components of feedback processing in a gambling task. , 2015, Psychophysiology.

[13]  N. Daw,et al.  Multiple Systems for Value Learning , 2014 .

[14]  Robert C. Wilson,et al.  Is Model Fitting Necessary for Model-Based fMRI? , 2015, PLoS Comput. Biol..

[15]  Roshan Cools,et al.  Feedback-related Negativity Codes Prediction Error but Not Behavioral Adjustment during Probabilistic Reversal Learning , 2011, Journal of Cognitive Neuroscience.

[16]  Nathaniel D. Daw,et al.  Selective impairment of prediction error signaling in human dorsolateral but not ventral striatum in Parkinson's disease patients: evidence from a model-based fMRI study , 2010, NeuroImage.

[17]  P. Dayan,et al.  A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[18]  M. Steinhauser,et al.  Top-down control over feedback processing: The probability of valid feedback affects feedback-related brain activity , 2017, Brain and Cognition.

[19]  J. Polich 50+ years of P300: Where are we now? , 2020, Psychophysiology.

[20]  Dylan A. Simon,et al.  Model-based choices involve prospective neural activity , 2015, Nature Neuroscience.

[21]  Thomas D. Sambrook,et al.  A neural reward prediction error revealed by a meta-analysis of ERPs using great grand averages. , 2015, Psychological bulletin.

[22]  David M. Groppe,et al.  Mass univariate analysis of event-related brain potentials/fields I: a critical tutorial review. , 2011, Psychophysiology.

[23]  Robert C. Wilson,et al.  Reinforcement Learning in Multidimensional Environments Relies on Attention Mechanisms , 2015, The Journal of Neuroscience.

[24]  Karl J. Friston,et al.  Bayesian model selection for group studies — Revisited , 2014, NeuroImage.

[25]  Charles H Hillman,et al.  Age, physical fitness, and attention: P3a and P3b. , 2009, Psychophysiology.

[26]  G. Pourtois,et al.  Goal relevance influences performance monitoring at the level of the FRN and P3 components. , 2016, Psychophysiology.

[27]  Kara D. Federmeier,et al.  Timed Action and Object Naming , 2005, Cortex.

[28]  M. Frank,et al.  Dopaminergic Genes Predict Individual Differences in Susceptibility to Confirmation Bias , 2011, The Journal of Neuroscience.

[29]  Y. Niv Reinforcement learning in the brain , 2009 .

[30]  Wouter Kool,et al.  Planning Complexity Registers as a Cost in Metacontrol , 2018, Journal of Cognitive Neuroscience.

[31]  E. Bernat,et al.  Theta and delta band activity explain N2 and P3 ERP component activity in a go/no-go task , 2014, Clinical Neurophysiology.

[32]  K. Branson,et al.  Behavioral Variability through Stochastic Choice and Its Gating by Anterior Cingulate Cortex , 2014, Cell.

[33]  Joshua W. Brown,et al.  Medial prefrontal cortex as an action-outcome predictor , 2011, Nature Neuroscience.

[34]  B. Balleine Neural bases of food-seeking: Affect, arousal and reward in corticostriatolimbic circuits , 2005, Physiology & Behavior.

[35]  Tim Fingscheidt,et al.  A computational analysis of the neural bases of Bayesian inference , 2015, NeuroImage.

[36]  J. O'Doherty,et al.  Model‐Based fMRI and Its Application to Reward Learning and Decision Making , 2007, Annals of the New York Academy of Sciences.

[37]  Clay B. Holroyd,et al.  It's worse than you thought: the feedback negativity and violations of reward prediction in gambling tasks. , 2007, Psychophysiology.

[38]  Greg H. Proudfit,et al.  Anterior cingulate activity to monetary loss and basal ganglia activity to monetary gain uniquely contribute to the feedback negativity , 2015, Clinical Neurophysiology.

[39]  H. Seo,et al.  Neural basis of reinforcement learning and decision making. , 2012, Annual review of neuroscience.

[40]  M. Steinhauser,et al.  The influence of internal models on feedback-related brain activity , 2020, Cognitive, Affective, & Behavioral Neuroscience.

[41]  M. Frank,et al.  Instructional control of reinforcement learning: A behavioral and neurocomputational investigation , 2009, Brain Research.

[42]  Dejan Draschkow,et al.  Cluster-based permutation tests of MEG/EEG data do not establish significance of effect latency or location. , 2019, Psychophysiology.

[43]  Joshua I. Gold,et al.  A Healthy Fear of the Unknown: Perspectives on the Interpretation of Parameter Fits from Computational Models in Neuroscience , 2013, PLoS Comput. Biol..

[44]  M. Botvinick,et al.  Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective , 2009, Cognition.

[45]  S. Gershman Empirical priors for reinforcement learning models , 2016 .

[46]  John R. Anderson,et al.  Learning from experience: Event-related potential correlates of reward processing, neural adaptation, and behavioral choice , 2012, Neuroscience & Biobehavioral Reviews.

[47]  Matthew R. Nassar,et al.  Catecholaminergic Regulation of Learning Rate in a Dynamic Environment , 2016, PLoS Comput. Biol..

[48]  M. Steinhauser,et al.  Effects of feedback reliability on feedback-related brain activity: A feedback valuation account , 2018, Cognitive, Affective, & Behavioral Neuroscience.

[49]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[50]  Margot J. Taylor,et al.  Guidelines for using human event-related potentials to study cognition: recording standards and publication criteria. , 2000, Psychophysiology.

[51]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[52]  B. Kopp,et al.  Prior probabilities modulate cortical surprise responses: A study of event-related potentials , 2016, Brain and Cognition.

[53]  R. Dolan,et al.  Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans , 2006, Nature.

[54]  Thomas D. Sambrook,et al.  Mediofrontal event-related potentials in response to positive, negative and unsigned prediction errors , 2014, Neuropsychologia.

[55]  H. Hoijtink,et al.  P300 amplitude variations, prior probabilities, and likelihoods: A Bayesian ERP study , 2016, Cognitive, affective & behavioral neuroscience.

[56]  N. Daw,et al.  Model-based learning protects against forming habits , 2015, Cognitive, Affective, & Behavioral Neuroscience.

[57]  E. Koechlin,et al.  The Importance of Falsification in Computational Cognitive Modeling , 2017, Trends in Cognitive Sciences.

[58]  Michael X. Cohen,et al.  Behavioral / Systems / Cognitive Reinforcement Learning Signals Predict Future Decisions , 2007 .

[59]  William H. Alexander,et al.  Hierarchical Error Representation: A Computational Model of Anterior Cingulate and Dorsolateral Prefrontal Cortex , 2015, Neural Computation.

[60]  John R. Anderson,et al.  Modulation of the feedback-related negativity by instruction and experience , 2011, Proceedings of the National Academy of Sciences.

[61]  W. Schultz Dopamine reward prediction error coding , 2016, Dialogues in clinical neuroscience.

[62]  Amir Dezfouli,et al.  Speed/Accuracy Trade-Off between the Habitual and the Goal-Directed Processes , 2011, PLoS Comput. Biol..

[63]  Darrell A. Worthy,et al.  Heterogeneity of strategy use in the Iowa gambling task: A comparison of win-stay/lose-shift and reinforcement learning models , 2013, Psychonomic bulletin & review.

[64]  Karl J. Friston,et al.  Variational free energy and the Laplace approximation , 2007, NeuroImage.

[65]  Wouter Kool,et al.  Cost-Benefit Arbitration Between Multiple Reinforcement-Learning Systems , 2017, Psychological science.

[66]  Olave E Krigolson,et al.  Event-related brain potentials and the study of reward processing: Methodological considerations. , 2017, International journal of psychophysiology : official journal of the International Organization of Psychophysiology.

[67]  Darrell A. Worthy,et al.  A Comparison Model of Reinforcement-Learning and Win-Stay-Lose-Shift Decision-Making Processes: A Tribute to W.K. Estes. , 2014, Journal of mathematical psychology.

[68]  U. Sailer,et al.  Effects of learning on feedback-related brain potentials in a decision-making task , 2010, Brain Research.

[69]  Yuan Chang Leong,et al.  Dynamic Interaction between Reinforcement Learning and Attention in Multidimensional Environments , 2017, Neuron.

[70]  Michael J. Frank,et al.  Statistical context dictates the relationship between feedback-related EEG signals and learning , 2019, bioRxiv.

[71]  P. Dayan,et al.  Reinforcement learning: The Good, The Bad and The Ugly , 2008, Current Opinion in Neurobiology.

[72]  S. Debener,et al.  Trial-by-Trial Fluctuations in the Event-Related Electroencephalogram Reflect Dynamic Changes in the Degree of Surprise , 2008, The Journal of Neuroscience.

[73]  Jeff T. Larsen,et al.  The good, the bad and the neutral: Electrophysiological responses to feedback stimuli , 2006, Brain Research.

[74]  Karl J. Friston,et al.  Bayesian model selection for group studies , 2009, NeuroImage.

[75]  Markus Ullsperger,et al.  Real and Fictive Outcomes Are Processed Differently but Converge on a Common Adaptive Mechanism , 2013, Neuron.

[76]  Andy J. Wills,et al.  Model-free and model-based reward prediction errors in EEG , 2018, NeuroImage.

[77]  Stefano Palminteri,et al.  Confirmation bias in human reinforcement learning: Evidence from counterfactual feedback processing , 2016, PLoS Comput. Biol..

[78]  Nathaniel D. Daw,et al.  Trial-by-trial data analysis using computational models , 2011 .

[79]  P. Dayan,et al.  Cortical substrates for exploratory decisions in humans , 2006, Nature.

[80]  C. Braun,et al.  Event-Related Brain Potentials Following Incorrect Feedback in a Time-Estimation Task: Evidence for a Generic Neural System for Error Detection , 1997, Journal of Cognitive Neuroscience.

[81]  Shinsuke Shimojo,et al.  Neural Computations Underlying Arbitration between Model-Based and Model-free Learning , 2013, Neuron.

[82]  M. Rushworth,et al.  Model-based analyses: Promises, pitfalls, and example applications to the study of cognitive control , 2011, Quarterly journal of experimental psychology.

[83]  P. Glimcher,et al.  JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR 2005, 84, 555–579 NUMBER 3(NOVEMBER) DYNAMIC RESPONSE-BY-RESPONSE MODELS OF MATCHING BEHAVIOR IN RHESUS MONKEYS , 2022 .

[84]  P. Dayan,et al.  Model-based influences on humans’ choices and striatal prediction errors , 2011, Neuron.

[85]  Gilles Pourtois,et al.  Neurophysiological evidence for evaluative feedback processing depending on goal relevance , 2020, NeuroImage.

[86]  Arnaud Delorme,et al.  EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis , 2004, Journal of Neuroscience Methods.

[87]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[88]  Clay B. Holroyd,et al.  The neural basis of human error processing: reinforcement learning, dopamine, and the error-related negativity. , 2002, Psychological review.

[89]  Adrian G. Fischer,et al.  The feedback-related negativity indexes prediction error in active but not observational learning. , 2019, Psychophysiology.

[90]  M. Steinhauser,et al.  Differential effects of instructed and objective feedback reliability on feedback-related brain activity. , 2019, Psychophysiology.