Reinforcement Learning Signals in the Human Striatum Distinguish Learners from Nonlearners during Reward-Based Decision Making

The computational framework of reinforcement learning has been used to forward our understanding of the neural mechanisms underlying reward learning and decision-making behavior. It is known that humans vary widely in their performance in decision-making tasks. Here, we used a simple four-armed bandit task in which subjects are almost evenly split into two groups on the basis of their performance: those who do learn to favor choice of the optimal action and those who do not. Using models of reinforcement learning we sought to determine the neural basis of these intrinsic differences in performance by scanning both groups with functional magnetic resonance imaging. We scanned 29 subjects while they performed the reward-based decision-making task. Our results suggest that these two groups differ markedly in the degree to which reinforcement learning signals in the striatum are engaged during task performance. While the learners showed robust prediction error signals in both the ventral and dorsal striatum during learning, the nonlearner group showed a marked absence of such signals. Moreover, the magnitude of prediction error signals in a region of dorsal striatum correlated significantly with a measure of behavioral performance across all subjects. These findings support a crucial role of prediction error signals, likely originating from dopaminergic midbrain neurons, in enabling learning of action selection preferences on the basis of obtained rewards. Thus, spontaneously observed individual differences in decision making performance demonstrate the suggested dependence of this type of learning on the functional integrity of the dopaminergic striatal system in humans.

[1]  A. Beck,et al.  An inventory for measuring depression. , 1961, Archives of general psychiatry.

[2]  P. Costa,et al.  Normal Personality Assessment in Clinical Practice: The NEO Personality Inventory. , 1992 .

[3]  P. Costa,et al.  Revised NEO Personality Inventory (NEO-PI-R) and NEO-Five-Factor Inventory (NEO-FFI) , 1992 .

[4]  Jack J. Husted,et al.  The Wisconsin Personality Disorders Inventory: Development, Reliability, and Validity , 1993 .

[5]  P. Dayan,et al.  A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[6]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[7]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[8]  Games of luck and games of chance: the effect of luck‐ versus chance‐orientation on gambling decisions , 1998 .

[9]  P. Redgrave,et al.  Is the short-latency dopamine response too short to signal reward error? , 1999, Trends in Neurosciences.

[10]  K. Stanovich,et al.  Heuristics and Biases: Individual Differences in Reasoning: Implications for the Rationality Debate? , 2002 .

[11]  Brian Knutson,et al.  FMRI Visualization of Brain Activity during a Monetary Incentive Delay Task , 2000, NeuroImage.

[12]  L. Nystrom,et al.  Tracking the hemodynamic responses to reward and punishment in the striatum. , 2000, Journal of neurophysiology.

[13]  Peter Dayan,et al.  Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems , 2001 .

[14]  W. Schultz Getting Formal with Dopamine and Reward , 2002, Neuron.

[15]  P. Dayan,et al.  Reward, Motivation, and Reinforcement Learning , 2002, Neuron.

[16]  Eytan Ruppin,et al.  Actor-critic models of the basal ganglia: new anatomical and computational perspectives , 2002, Neural Networks.

[17]  P. Salkovskis,et al.  The Obsessive-Compulsive Inventory: development and validation of a short version. , 2002, Psychological assessment.

[18]  P. Montague,et al.  Activity in human ventral striatum locked to errors of reward prediction , 2002, Nature Neuroscience.

[19]  Roland E. Suri,et al.  TD models of reward predictive responses in dopamine neurons , 2002, Neural Networks.

[20]  K. Stanovich,et al.  Is probability matching smart? Associations between probabilistic choices and cognitive ability , 2003, Memory & cognition.

[21]  Samuel M. McClure,et al.  Temporal Prediction Errors in a Passive Learning Task Activate Human Striatum , 2003, Neuron.

[22]  Karl J. Friston,et al.  Temporal Difference Models and Reward-Related Learning in the Human Brain , 2003, Neuron.

[23]  Samuel M. McClure,et al.  The Neural Substrates of Reward Processing in Humans: The Modern Role of fMRI , 2004, The Neuroscientist : a review journal bringing neurobiology, neurology and psychiatry.

[24]  Karl J. Friston,et al.  Dissociable Roles of Ventral and Dorsal Striatum in Instrumental Conditioning , 2004, Science.

[25]  M. Delgado,et al.  Modulation of Caudate Activity by Action Contingency , 2004, Neuron.

[26]  J. O'Doherty,et al.  Reward representations and reward-related learning in the human brain: insights from neuroimaging , 2004, Current Opinion in Neurobiology.

[27]  T. Robbins,et al.  Chemistry of the adaptive mind , 2004, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[28]  B. Newell Re-visions of rationality? , 2005, Trends in Cognitive Sciences.

[29]  Carol A. Seger,et al.  The Roles of the Caudate Nucleus in Human Classification Learning , 2005, The Journal of Neuroscience.

[30]  P. Glimcher,et al.  JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR 2005, 84, 555–579 NUMBER 3(NOVEMBER) DYNAMIC RESPONSE-BY-RESPONSE MODELS OF MATCHING BEHAVIOR IN RHESUS MONKEYS , 2022 .

[31]  D. Joel,et al.  Impaired procedural learning in obsessive–compulsive disorder and Parkinson's disease, but not in major depressive disorder , 2005, Behavioural Brain Research.

[32]  Caroline F. Zink,et al.  Human striatal activation reflects degree of stimulus saliency , 2006, NeuroImage.

[33]  P. Dayan,et al.  Cortical substrates for exploratory decisions in humans , 2006, Nature.

[34]  R. Poldrack,et al.  Ventral–striatal/nucleus–accumbens sensitivity to prediction errors during classification learning , 2006, Human brain mapping.

[35]  R. Dolan,et al.  Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans , 2006, Nature.

[36]  E. Vaadia,et al.  Midbrain dopamine neurons encode decisions for future action , 2006, Nature Neuroscience.

[37]  David S. Touretzky,et al.  Representation and Timing in Theories of the Dopamine System , 2006, Neural Computation.

[38]  K. Doya,et al.  The computational neurobiology of learning and reward , 2006, Current Opinion in Neurobiology.

[39]  Kevin McCabe,et al.  Neural signature of fictive learning signals in a sequential investment task , 2007, Proceedings of the National Academy of Sciences.