Neural Prediction Errors Reveal a Risk-Sensitive Reinforcement-Learning Process in the Human Brain

Humans and animals are exquisitely, though idiosyncratically, sensitive to risk or variance in the outcomes of their actions. Economic, psychological, and neural aspects of this are well studied when information about risk is provided explicitly. However, we must normally learn about outcomes from experience, through trial and error. Traditional models of such reinforcement learning focus on learning about the mean reward value of cues and ignore higher order moments such as variance. We used fMRI to test whether the neural correlates of human reinforcement learning are sensitive to experienced risk. Our analysis focused on anatomically delineated regions of a priori interest in the nucleus accumbens, where blood oxygenation level-dependent (BOLD) signals have been suggested as correlating with quantities derived from reinforcement learning. We first provide unbiased evidence that the raw BOLD signal in these regions corresponds closely to a reward prediction error. We then derive from this signal the learned values of cues that predict rewards of equal mean but different variance and show that these values are indeed modulated by experienced risk. Moreover, a close neurometric–psychometric coupling exists between the fluctuations of the experience-based evaluations of risky options that we measured neurally and the fluctuations in behavioral risk aversion. This suggests that risk sensitivity is integral to human learning, illuminating economic models of choice, neuroscientific models of affective learning, and the workings of the underlying neural mechanisms.

[1]  D. Bernoulli Exposition of a New Theory on the Measurement of Risk , 1954 .

[2]  R. Rescorla,et al.  A theory of Pavlovian conditioning : Variations in the effectiveness of reinforcement and nonreinforcement , 1972 .

[3]  O. Hikosaka Models of information processing in the basal Ganglia edited by James C. Houk, Joel L. Davis and David G. Beiser, The MIT Press, 1995. $60.00 (400 pp) ISBN 0 262 08234 9 , 1995, Trends in Neurosciences.

[4]  P. Dayan,et al.  A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[5]  A. Kacelnik,et al.  Risky Theories—The Effects of Variance on Foraging Decisions , 1996 .

[6]  R W Cox,et al.  AFNI: software for analysis and visualization of functional magnetic resonance neuroimages. , 1996, Computers and biomedical research, an international journal.

[7]  P. D. Smallwood An Introduction to Risk Sensitivity: The Use of Jensen's Inequality to Clarify Evolutionary Arguments of Adaptation and Constraint , 1996 .

[8]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[9]  R. Elliott,et al.  Ventromedial prefrontal cortex mediates guessing , 1999, Neuropsychologia.

[10]  S. Kakade,et al.  Learning and selective attention , 2000, Nature Neuroscience.

[11]  D. Kahneman,et al.  Functional Imaging of Neural Responses to Expectancy and Experience of Monetary Gains and Losses tasks with monetary payoffs , 2001 .

[12]  J. March,et al.  Adaptation as Information Restriction: The Hot Stove Effect , 2001 .

[13]  M E Bitterman,et al.  Quantitative tests of an associative theory of risk-sensitivity in honeybees. , 2001, The Journal of experimental biology.

[14]  R. Thaler,et al.  Anomalies: Risk Aversion , 2001 .

[15]  Isaac Meilijson,et al.  Evolution of Reinforcement Learning in Uncertain Environments: A Simple Explanation for Complex Foraging Behaviors , 2002, Adapt. Behav..

[16]  Karl J. Friston,et al.  Temporal difference learning model accounts for responses in human ventral striatum , 2002 .

[17]  J. Wickens,et al.  Neural mechanisms of reward-related motor learning , 2003, Current Opinion in Neurobiology.

[18]  Samuel M. McClure,et al.  Temporal Prediction Errors in a Passive Learning Task Activate Human Striatum , 2003, Neuron.

[19]  Karl J. Friston,et al.  Temporal Difference Models and Reward-Related Learning in the Human Brain , 2003, Neuron.

[20]  W. Schultz,et al.  Discrete Coding of Reward Probability and Uncertainty by Dopamine Neurons , 2003, Science.

[21]  Karl J. Friston,et al.  Dissociable Roles of Ventral and Dorsal Striatum in Instrumental Conditioning , 2004, Science.

[22]  E. Weber,et al.  Predicting Risk-Sensitivity in Humans and Lower Animals: Risk as Variance or Coefficient of Variation , 2004, Psychological review.

[23]  R. Hertwig,et al.  Decisions from Experience and the Effect of Rare Events in Risky Choice , 2004, Psychological science.

[24]  Peter Dayan,et al.  Temporal difference models describe higher-order learning in humans , 2004, Nature.

[25]  A. Simmons,et al.  Selective activation of the nucleus accumbens during risk-taking decision making , 2004, Neuroreport.

[26]  M. Platt,et al.  Risk-sensitive neurons in macaque posterior cingulate cortex , 2005, Nature Neuroscience.

[27]  Matthew T. Kaufman,et al.  Distributed Neural Representation of Expected Value , 2005, The Journal of Neuroscience.

[28]  Camelia M. Kuhnen,et al.  The Neural Basis of Financial Risk Taking , 2005, Neuron.

[29]  Colin Camerer,et al.  Neural Systems Responding to Degrees of Uncertainty in Human Decision-Making , 2005, Science.

[30]  W. Schultz,et al.  Adaptive Coding of Reward Value by Dopamine Neurons , 2005, Science.

[31]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[32]  G. McCarthy,et al.  Decisions under Uncertainty: Probabilistic Context Influences Activation of Prefrontal and Parietal Cortices , 2005, The Journal of Neuroscience.

[33]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[34]  P. Glimcher,et al.  Midbrain Dopamine Neurons Encode a Quantitative Reward Prediction Error Signal , 2005, Neuron.

[35]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[36]  P. Dayan,et al.  Dopamine, uncertainty and TD learning , 2005, Behavioral and Brain Functions.

[37]  Brian Knutson,et al.  Linking nucleus accumbens dopamine and blood oxygenation , 2007, Psychopharmacology.

[38]  J. O'Doherty,et al.  Is Avoiding an Aversive Outcome Rewarding? Neural Substrates of Avoidance Learning in the Human Brain , 2006, PLoS biology.

[39]  P. Dayan,et al.  Cortical substrates for exploratory decisions in humans , 2006, Nature.

[40]  J. O'Doherty,et al.  Human Neural Learning Depends on Reward Prediction Errors in the Blocking Paradigm , 2005, Journal of neurophysiology.

[41]  S. Quartz,et al.  Neural Differentiation of Expected Reward and Risk in Human Subcortical Structures , 2006, Neuron.

[42]  Evan M. Gordon,et al.  Neural Signatures of Economic Preferences for Risk and Ambiguity , 2006, Neuron.

[43]  E. Vaadia,et al.  Midbrain dopamine neurons encode decisions for future action , 2006, Nature Neuroscience.

[44]  Samuel M. McClure,et al.  Policy Adjustment in a Dynamic Economic Game , 2006, PloS one.

[45]  Henrik Walter,et al.  Prediction error as a linear function of reward probability is coded in human nucleus accumbens , 2006, NeuroImage.

[46]  Jerker Denrell Adaptive learning and risk taking. , 2007, Psychological review.

[47]  J. O'Doherty,et al.  Reward Value Coding Distinct From Risk Attitude-Related Uncertainty Coding in Human Reward Systems , 2006, Journal of neurophysiology.

[48]  N. Daw,et al.  Reinforcement Learning Signals in the Human Striatum Distinguish Learners from Nonlearners during Reward-Based Decision Making , 2007, The Journal of Neuroscience.

[49]  Brian Knutson,et al.  Neural Antecedents of Financial Decisions , 2007, The Journal of Neuroscience.

[50]  M. Roesch,et al.  Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards , 2007, Nature Neuroscience.

[51]  Sabrina M. Tom,et al.  The Neural Basis of Loss Aversion in Decision-Making Under Risk , 2007, Science.

[52]  P. Dayan,et al.  Differential Encoding of Losses and Gains in the Human Striatum , 2007, The Journal of Neuroscience.

[53]  H. Heinze,et al.  Mesolimbic Functional Magnetic Resonance Imaging Activations during Reward Anticipation Correlate with Reward-Related Ventral Striatal Dopamine Release , 2008, The Journal of Neuroscience.

[54]  Colin Camerer,et al.  Dissociating the Role of the Orbitofrontal Cortex and the Striatum in the Computation of Goal Values and Prediction Errors , 2008, The Journal of Neuroscience.

[55]  Y. Niv,et al.  Dialogues on prediction errors , 2008, Trends in Cognitive Sciences.

[56]  Ryan K. Jessup,et al.  Feedback Produces Divergence From Prospect Theory in Descriptive Choice , 2008, Psychological science.

[57]  S. Quartz,et al.  Human Insula Activation Reflects Risk Prediction Errors As Well As Risk , 2008, The Journal of Neuroscience.

[58]  Samuel M. McClure,et al.  BOLD Responses Reflecting Dopaminergic Signals in the Human Ventral Tegmental Area , 2008, Science.

[59]  B. Hayden,et al.  Gambling for Gatorade: risk-sensitive decision making for fluid rewards in humans , 2008, Animal Cognition.

[60]  P. Greengard,et al.  Dichotomous Dopaminergic Control of Striatal Synaptic Plasticity , 2008, Science.

[61]  M. Platt,et al.  Risky business: the neuroeconomics of decision making under uncertainty , 2008, Nature Neuroscience.

[62]  Eric J. Johnson,et al.  Decisions Under Uncertainty: Psychological, Economic, and Neuroeconomic Explanations of Risk Preference , 2009 .

[63]  Y. Niv Reinforcement learning in the brain , 2009 .

[64]  R. Hertwig,et al.  The description–experience gap in risky choice , 2009, Trends in Cognitive Sciences.

[65]  Jeffrey N. Rouder,et al.  Bayesian t tests for accepting and rejecting the null hypothesis , 2009, Psychonomic bulletin & review.

[66]  Colin Camerer,et al.  Neural Response to Reward Anticipation under Risk Is Nonlinear in Probabilities , 2009, The Journal of Neuroscience.

[67]  Thomas H. B. FitzGerald,et al.  Differentiable Neural Substrates for Learned and Described Value and Risk , 2010, Current Biology.

[68]  Nathaniel D. Daw,et al.  Trial-by-trial data analysis using computational models , 2011 .

[69]  T. Robbins,et al.  Decision Making, Affect, and Learning: Attention and Performance XXIII , 2011 .

[70]  R. Marks Learning to be risk averse? , 2014, 2014 IEEE Conference on Computational Intelligence for Financial Engineering & Economics (CIFEr).

[71]  Klaus Obermayer,et al.  Risk-Sensitive Reinforcement Learning , 2013, Neural Computation.