Testing the Reward Prediction Error Hypothesis with an Axiomatic Model

Neuroimaging studies typically identify neural activity correlated with the predictions of highly parameterized models, like the many reward prediction error (RPE) models used to study reinforcement learning. Identified brain areas might encode RPEs or, alternatively, only have activity correlated with RPE model predictions. Here, we use an alternate axiomatic approach rooted in economic theory to formally test the entire class of RPE models on neural data. We show that measurements of human neural activity from the striatum, medial prefrontal cortex, amygdala, and posterior cingulate cortex satisfy necessary and sufficient conditions for the entire class of RPE models. However, activity measured from the anterior insula falsifies the axiomatic model, and therefore no RPE model can account for measured activity. Further analysis suggests the anterior insula might instead encode something related to the salience of an outcome. As cognitive neuroscience matures and models proliferate, formal approaches of this kind that assess entire model classes rather than specific model exemplars may take on increased significance.

[1]  P. Glimcher,et al.  MEASURING BELIEFS AND REWARDS: A NEUROECONOMIC APPROACH. , 2010, The quarterly journal of economics.

[2]  M. Hallett,et al.  Mechanisms Underlying Dopamine-Mediated Reward Bias in Compulsive Behaviors , 2010, Neuron.

[3]  M. Gluck,et al.  Dopaminergic Drugs Modulate Learning Rates and Perseveration in Parkinson's Patients in a Dynamic Foraging Task , 2009, The Journal of Neuroscience.

[4]  Markus Ullsperger,et al.  When Errors Are Rewarding , 2009, The Journal of Neuroscience.

[5]  O. Hikosaka,et al.  Two types of dopamine neuron distinctly convey positive and negative motivational signals , 2009, Nature.

[6]  M. Ungless,et al.  Phasic excitation of dopamine neurons in ventral VTA by noxious stimuli , 2009, Proceedings of the National Academy of Sciences.

[7]  E. Vaadia,et al.  Midbrain Dopaminergic Neurons and Striatal Cholinergic Interneurons Encode the Difference between Reward and Aversive Events at Different Epochs of Probabilistic Classical Conditioning Trials , 2008, The Journal of Neuroscience.

[8]  Mark W Woolrich,et al.  Associative learning of social value , 2008, Nature.

[9]  N. Daw,et al.  Striatal Activity Underlies Novelty-Based Choice in Humans , 2008, Neuron.

[10]  Colin Camerer,et al.  Dissociating the Role of the Orbitofrontal Cortex and the Striatum in the Computation of Goal Values and Prediction Errors , 2008, The Journal of Neuroscience.

[11]  Andrew Caplin,et al.  Dopamine, Reward Prediction Error, and Economics , 2008 .

[12]  Andrew Caplin,et al.  Axiomatic methods, dopamine and reward prediction error , 2008, Current Opinion in Neurobiology.

[13]  S. Quartz,et al.  Human Insula Activation Reflects Risk Prediction Errors As Well As Risk , 2008, The Journal of Neuroscience.

[14]  Samuel M. McClure,et al.  BOLD Responses Reflecting Dopaminergic Signals in the Human Ventral Tegmental Area , 2008, Science.

[15]  Joseph J. Paton,et al.  Expectation Modulates Neural Responses to Pleasant and Aversive Stimuli in Primate Amygdala , 2007, Neuron.

[16]  R. Wightman,et al.  Associative learning mediates dynamic shifts in dopamine signaling in the nucleus accumbens , 2007, Nature Neuroscience.

[17]  O. Hikosaka,et al.  Lateral habenula as a source of negative reward signals in dopamine neurons , 2007, Nature.

[18]  Keiji Tanaka,et al.  Medial prefrontal cell activity signaling prediction errors of action values , 2007, Nature Neuroscience.

[19]  S. Kapur,et al.  Separate brain regions code for salience vs. valence during reward prediction in humans , 2007, Human brain mapping.

[20]  G. Glover,et al.  Dissociable Intrinsic Connectivity Networks for Salience Processing and Executive Control , 2007, The Journal of Neuroscience.

[21]  Samuel M. McClure,et al.  Policy Adjustment in a Dynamic Economic Game , 2006, PloS one.

[22]  J. Gläscher,et al.  Dissociable Systems for Gain- and Loss-Related Value Predictions and Errors of Prediction in the Human Brain , 2006, The Journal of Neuroscience.

[23]  R. Dolan,et al.  Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans , 2006, Nature.

[24]  S. Quartz,et al.  Neural Differentiation of Expected Reward and Risk in Human Subcortical Structures , 2006, Neuron.

[25]  Henrik Walter,et al.  Prediction error as a linear function of reward probability is coded in human nucleus accumbens , 2006, NeuroImage.

[26]  J. Hirsch,et al.  A Neural Representation of Categorization Uncertainty in the Human Brain , 2006, Neuron.

[27]  P. Glimcher,et al.  Midbrain Dopamine Neurons Encode a Quantitative Reward Prediction Error Signal , 2005, Neuron.

[28]  G. McCarthy,et al.  Decisions under Uncertainty: Probabilistic Context Influences Activation of Prefrontal and Parietal Cortices , 2005, The Journal of Neuroscience.

[29]  Peter Dayan,et al.  Temporal difference models describe higher-order learning in humans , 2004, Nature.

[30]  G. Pagnoni,et al.  Human Striatal Responses to Monetary Reward Depend On Saliency , 2004, Neuron.

[31]  Karl J. Friston,et al.  Dissociable Roles of Ventral and Dorsal Striatum in Instrumental Conditioning , 2004, Science.

[32]  O. Hikosaka,et al.  Dopamine Neurons Can Represent Context-Dependent Prediction Error , 2004, Neuron.

[33]  J. C. Crowley,et al.  Saccade Reward Signals in Posterior Cingulate Cortex , 2003, Neuron.

[34]  G. Pagnoni,et al.  Human Striatal Response to Salient Nonrewarding Stimuli , 2003, The Journal of Neuroscience.

[35]  D. V. von Cramon,et al.  Error Monitoring Using External Feedback: Specific Roles of the Habenular Complex, the Reward System, and the Cingulate Motor Area Revealed by Functional Magnetic Resonance Imaging , 2003, The Journal of Neuroscience.

[36]  Karl J. Friston,et al.  Temporal Difference Models and Reward-Related Learning in the Human Brain , 2003, Neuron.

[37]  Samuel M. McClure,et al.  Temporal Prediction Errors in a Passive Learning Task Activate Human Striatum , 2003, Neuron.

[38]  R. Wightman,et al.  Subsecond dopamine release promotes cocaine seeking , 2003, Nature.

[39]  J. Horvitz Mesolimbocortical and nigrostriatal dopamine responses to salient non-reward events , 2000, Neuroscience.

[40]  P. Redgrave,et al.  Is the short-latency dopamine response too short to signal reward error? , 1999, Trends in Neurosciences.

[41]  J. Hollerman,et al.  Dopamine neurons report an error in the temporal prediction of reward during learning , 1998, Nature Neuroscience.

[42]  D. Noll,et al.  Nonlinear Aspects of the BOLD Response in Functional MRI , 1998, NeuroImage.

[43]  Karl J. Friston,et al.  Nonlinear event‐related responses in fMRI , 1998, Magnetic resonance in medicine.

[44]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[45]  N. Makris,et al.  MRI-Based Topographic Parcellation of Human Neocortex: An Anatomically Specified Method with Estimate of Reliability , 1996, Journal of Cognitive Neuroscience.

[46]  A. Galaburda,et al.  Human Cerebral Cortex: Localization, Parcellation, and Morphometry with Magnetic Resonance Imaging , 1992, Journal of Cognitive Neuroscience.

[47]  M. Gabriel,et al.  Learning and Computational Neuroscience: Foundations of Adaptive Networks , 1990 .

[48]  Gary James Jason,et al.  The Logic of Scientific Discovery , 1988 .

[49]  M. Kendall,et al.  The Logic of Scientific Discovery. , 1959 .

[50]  A. Wald Tests of statistical hypotheses concerning several parameters when the number of observations is large , 1943 .

[51]  Richard S. Sutton,et al.  Time-Derivative Models of Pavlovian Reinforcement , 1990 .

[52]  R. Rescorla,et al.  A theory of Pavlovian conditioning : Variations in the effectiveness of reinforcement and nonreinforcement , 1972 .

[53]  Justin A. Blanco,et al.  Supporting Online Material Materials and Methods Som Text Figs. S1 to S9 References Movie S1 Human Substantia Nigra Neurons Encode Unexpected Financial Rewards , 2022 .