Human Dorsal Striatal Activity during Choice Discriminates Reinforcement Learning Behavior from the Gambler's Fallacy

Reinforcement learning theory has generated substantial interest in neurobiology, particularly because of the resemblance between phasic dopamine and reward prediction errors. Actor–critic theories have been adapted to account for the functions of the striatum, with parts of the dorsal striatum equated to the actor. Here, we specifically test whether the human dorsal striatum—as predicted by an actor–critic instantiation—is used on a trial-to-trial basis at the time of choice to choose in accordance with reinforcement learning theory, as opposed to a competing strategy: the gambler's fallacy. Using a partial-brain functional magnetic resonance imaging scanning protocol focused on the striatum and other ventral brain areas, we found that the dorsal striatum is more active when choosing consistent with reinforcement learning compared with the competing strategy. Moreover, an overlapping area of dorsal striatum along with the ventral striatum was found to be correlated with reward prediction errors at the time of outcome, as predicted by the actor–critic framework. These findings suggest that the same region of dorsal striatum involved in learning stimulus–response associations may contribute to the control of behavior during choice, thereby using those learned associations. Intriguingly, neither reinforcement learning nor the gambler's fallacy conformed to the optimal choice strategy on the specific decision-making task we used. Thus, the dorsal striatum may contribute to the control of behavior according to reinforcement learning even when the prescriptions of such an algorithm are suboptimal in terms of maximizing future rewards.

[1]  M. Jarvik,et al.  Probability learning and a negative recency effect in the serial anticipation of alternative symbols. , 1951, Journal of experimental psychology.

[2]  M E Bitterman,et al.  Probability Learning. , 1962, Science.

[3]  A. Tversky,et al.  BELIEF IN THE LAW OF SMALL NUMBERS , 1971, Pediatrics.

[4]  R. Shiffrin,et al.  Controlled and automatic human information processing: I , 1977 .

[5]  Walter Schneider,et al.  Controlled and automatic human information processing: II. Perceptual learning, automatic attending and a general theory. , 1977 .

[6]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[7]  C. Plott,et al.  Economic Theory of Choice and the Preference Reversal Phenomenon , 1979 .

[8]  R. Duncan Luce,et al.  Individual Choice Behavior: A Theoretical Analysis , 1979 .

[9]  P. Dayan,et al.  A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[10]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[11]  J. Hollerman,et al.  Dopamine neurons report an error in the temporal prediction of reward during learning , 1998, Nature Neuroscience.

[12]  B. Balleine,et al.  Goal-directed instrumental action: contingency and incentive learning and their cortical substrates , 1998, Neuropharmacology.

[13]  A. Diederich,et al.  Conflict and the Stochastic-Dominance Principle of Decision Making , 1999 .

[14]  Kenji Doya,et al.  What are the computations of the cerebellum, the basal ganglia and the cerebral cortex? , 1999, Neural Networks.

[15]  M. Rabin Inference by Believers in the Law of Small Numbers , 2000 .

[16]  O. Hikosaka,et al.  A neural correlate of response bias in monkey caudate nucleus , 2002, Nature.

[17]  D. Kahneman,et al.  Representativeness revisited: Attribute substitution in intuitive judgment. , 2002 .

[18]  Kenji Doya,et al.  Metalearning and neuromodulation , 2002, Neural Networks.

[19]  D. Kahneman,et al.  Heuristics and Biases: The Psychology of Intuitive Judgment , 2002 .

[20]  G. McCarthy,et al.  Perceiving patterns in random series: dynamic processing of sequence in prefrontal cortex , 2002, Nature Neuroscience.

[21]  R Turner,et al.  Optimized EPI for fMRI studies of the orbitofrontal cortex , 2003, NeuroImage.

[22]  J. Wickens,et al.  Neural mechanisms of reward-related motor learning , 2003, Current Opinion in Neurobiology.

[23]  Jobu Watanabe,et al.  Context-dependent cortical activation in response to financial reward and penalty: an event-related fMRI study , 2003, NeuroImage.

[24]  Karl J. Friston,et al.  Temporal Difference Models and Reward-Related Learning in the Human Brain , 2003, Neuron.

[25]  Karl J. Friston,et al.  Dissociable Roles of Ventral and Dorsal Striatum in Instrumental Conditioning , 2004, Science.

[26]  Samuel M. McClure,et al.  Separate Neural Systems Value Immediate and Delayed Monetary Rewards , 2004, Science.

[27]  R. Wightman,et al.  Dopamine Operates as a Subsecond Modulator of Food Seeking , 2004, The Journal of Neuroscience.

[28]  Peter Ayton,et al.  The hot hand fallacy and the gambler’s fallacy: Two faces of subjective randomness? , 2004, Memory & cognition.

[29]  Michael J. Frank,et al.  By Carrot or by Stick: Cognitive Reinforcement Learning in Parkinsonism , 2004, Science.

[30]  B. Balleine,et al.  Lesions of dorsolateral striatum preserve outcome expectancy but disrupt habit formation in instrumental learning , 2004, The European journal of neuroscience.

[31]  K. Doya,et al.  A Neural Correlate of Reward-Based Behavioral Learning in Caudate Nucleus: A Functional Magnetic Resonance Imaging Study of a Stochastic Decision Task , 2004, The Journal of Neuroscience.

[32]  M. Delgado,et al.  Modulation of Caudate Activity by Action Contingency , 2004, Neuron.

[33]  K. Doya,et al.  Representation of Action-Specific Reward Values in the Striatum , 2005, Science.

[34]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[35]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[36]  Jeffrey C. Cooper,et al.  Functional magnetic resonance imaging of reward prediction , 2005, Current opinion in neurology.

[37]  Rachel T. A. Croson,et al.  The Gambler’s Fallacy and the Hot Hand: Empirical Data from Casinos , 2005 .

[38]  Kae Nakamura,et al.  Basal ganglia orient eyes to reward. , 2006, Journal of neurophysiology.

[39]  P. Dayan,et al.  Cortical substrates for exploratory decisions in humans , 2006, Nature.

[40]  B. Balleine,et al.  The Role of the Dorsal Striatum in Reward and Decision-Making , 2007, The Journal of Neuroscience.

[41]  J. O'Doherty,et al.  Model‐Based fMRI and Its Application to Reward Learning and Decision Making , 2007, Annals of the New York Academy of Sciences.

[42]  N. Daw,et al.  Reinforcement Learning Signals in the Human Striatum Distinguish Learners from Nonlearners during Reward-Based Decision Making , 2007, The Journal of Neuroscience.

[43]  R. O’Reilly,et al.  Separate neural substrates for skill learning and performance in the ventral and dorsal striatum , 2007, Nature Neuroscience.

[44]  Vivian V. Valentin,et al.  Determining the Neural Substrates of Goal-Directed Learning in the Human Brain , 2007, The Journal of Neuroscience.

[45]  Ryan K. Jessup,et al.  Feedback Produces Divergence From Prospect Theory in Descriptive Choice , 2008, Psychological science.

[46]  P. Glimcher,et al.  Value Representations in the Primate Striatum during Matching Behavior , 2008, Neuron.

[47]  Ulrike Hahn,et al.  Perceptions of randomness: why three heads are better than four. , 2009, Psychological review.

[48]  Vivian V. Valentin,et al.  Overlapping prediction errors in dorsal striatum during instrumental learning with juice and money reward in the human brain. , 2009, Journal of neurophysiology.

[49]  P. Dayan,et al.  States versus Rewards: Dissociable Neural Prediction Error Signals Underlying Model-Based and Model-Free Reinforcement Learning , 2010, Neuron.

[50]  B. Balleine,et al.  Human and Rodent Homologies in Action Control: Corticostriatal Determinants of Goal-Directed and Habitual Action , 2010, Neuropsychopharmacology.

[51]  Greg Barron,et al.  The role of experience in the Gambler's Fallacy , 2010 .

[52]  Ryan K. Jessup,et al.  Differentiable contributions of human amygdalar subregions in the computations underlying reward and avoidance learning , 2011, The European journal of neuroscience.