Beyond Reward Prediction Errors: Human Striatum Updates Rule Values During Learning

Humans naturally group the world into coherent categories defined by membership rules. Rules can be learned implicitly by building stimulus-response associations using reinforcement learning or by using explicit reasoning. We tested if the striatum, in which activation reliably scales with reward prediction error, would track prediction errors in a task that required explicit rule generation. Using functional magnetic resonance imaging during a categorization task, we show that striatal responses to feedback scale with a "surprise" signal derived from a Bayesian rule-learning model and are inconsistent with RL prediction error. We also find that striatum and caudal inferior frontal sulcus (cIFS) are involved in updating the likelihood of discriminative rules. We conclude that the striatum, in cooperation with the cIFS, is involved in updating the values assigned to categorization rules when people learn using explicit reasoning.

[1]  C. Mathys,et al.  Hierarchical Prediction Errors in Midbrain and Basal Forebrain during Sensory Learning , 2013, Neuron.

[2]  R. Schmidt,et al.  Striatal action-learning based on dopamine concentration , 2009, Experimental Brain Research.

[3]  Vincent D Costa,et al.  Reversal Learning and Dopamine: A Bayesian Perspective , 2015, The Journal of Neuroscience.

[4]  T. Münte,et al.  Learning by doing: an fMRI study of feedback-related brain activations , 2007, Neuroreport.

[5]  K. Deisseroth,et al.  Input-specific control of reward and aversion in the ventral tegmental area , 2012, Nature.

[6]  Colin Camerer,et al.  Dissociating the Role of the Orbitofrontal Cortex and the Striatum in the Computation of Goal Values and Prediction Errors , 2008, The Journal of Neuroscience.

[7]  Erika Nyhus,et al.  The Wisconsin Card Sorting Test and the cognitive assessment of prefrontal executive functions: A critical update , 2009, Brain and Cognition.

[8]  O. Hikosaka,et al.  Reward-predicting activity of dopamine and caudate neurons--a possible mechanism of motivational control of saccadic eye movement. , 2004, Journal of neurophysiology.

[9]  Samuel M. McClure,et al.  Temporal Prediction Errors in a Passive Learning Task Activate Human Striatum , 2003, Neuron.

[10]  Noah D. Goodman,et al.  Bootstrapping in a language of thought: A formal model of numerical concept learning , 2012, Cognition.

[11]  Scott A. Huettel,et al.  Functional Significance of Striatal Responses during Episodic Decisions: Recovery or Goal Attainment? , 2010, The Journal of Neuroscience.

[12]  Jane R. Garrison,et al.  Prediction error in reinforcement learning: A meta-analysis of neuroimaging studies , 2013, Neuroscience & Biobehavioral Reviews.

[13]  Thomas L. Griffiths,et al.  A Rational Analysis of Rule-Based Concept Learning , 2008, Cogn. Sci..

[14]  R. O’Reilly,et al.  Conjunctive representations in learning and memory: principles of cortical and hippocampal function. , 2001, Psychological review.

[15]  E. Tricomi,et al.  Basal ganglia engagement during feedback processing after a substantial delay , 2013, Cognitive, affective & behavioral neuroscience.

[16]  Ethan S. Bromberg-Martin,et al.  Midbrain Dopamine Neurons Signal Preference for Advance Information about Upcoming Rewards , 2009, Neuron.

[17]  Joseph T. McGuire,et al.  A Neural Signature of Hierarchical Reinforcement Learning , 2011, Neuron.

[18]  P. Dayan,et al.  States versus Rewards: Dissociable Neural Prediction Error Signals Underlying Model-Based and Model-Free Reinforcement Learning , 2010, Neuron.

[19]  W. Schultz,et al.  Discrete Coding of Reward Probability and Uncertainty by Dopamine Neurons , 2003, Science.

[20]  Josiah R. Boivin,et al.  A Causal Link Between Prediction Errors, Dopamine Neurons and Learning , 2013, Nature Neuroscience.

[21]  N. Daw,et al.  Differential roles of human striatum and amygdala in associative learning , 2011, Nature Neuroscience.

[22]  S. Monsell Task switching , 2003, Trends in Cognitive Sciences.

[23]  James L. McClelland,et al.  Performance Feedback Drives Caudate Activation in a Phonological Learning Task , 2006, Journal of Cognitive Neuroscience.

[24]  P. Dayan,et al.  A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[25]  Robert C. Wilson,et al.  Reinforcement Learning in Multidimensional Environments Relies on Attention Mechanisms , 2015, The Journal of Neuroscience.

[26]  Eric L. Denovellis,et al.  Synchronous Oscillatory Neural Ensembles for Rules in the Prefrontal Cortex , 2012, Neuron.

[27]  T. Lohrenz,et al.  BOLD and its connection to dopamine release in human striatum: a cross-cohort comparison , 2016, Philosophical Transactions of the Royal Society B: Biological Sciences.

[28]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[29]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[30]  K. Doya,et al.  Representation of Action-Specific Reward Values in the Striatum , 2005, Science.

[31]  A. Grace,et al.  The Yin and Yang of dopamine release: a new perspective , 2007, Neuropharmacology.

[32]  R. J. Dolan,et al.  Differential neural response to positive and negative feedback in planning and guessing tasks , 1997, Neuropsychologia.

[33]  Timothy Edward John Behrens,et al.  Connectivity-based functional analysis of dopamine release in the striatum using diffusion-weighted MRI and positron emission tomography. , 2014, Cerebral cortex.

[34]  M. D’Esposito,et al.  Is the rostro-caudal axis of the frontal lobe hierarchical? , 2009, Nature Reviews Neuroscience.

[35]  E. Miller,et al.  An integrative theory of prefrontal cortex function. , 2001, Annual review of neuroscience.

[36]  K. Berridge From prediction error to incentive salience: mesolimbic computation of reward motivation , 2012, The European journal of neuroscience.

[37]  Carlos Diuk,et al.  Hierarchical Learning Induces Two Simultaneous, But Separable, Prediction Errors in Human Basal Ganglia , 2013, The Journal of Neuroscience.

[38]  O. Hikosaka,et al.  Two types of dopamine neuron distinctly convey positive and negative motivational signals , 2009, Nature.

[39]  M. Petrides,et al.  Wisconsin Card Sorting Revisited: Distinct Neural Circuits Participating in Different Stages of the Task Identified by Event-Related Functional Magnetic Resonance Imaging , 2001, The Journal of Neuroscience.

[40]  Vaughn L. Hetrick,et al.  Mesolimbic Dopamine Signals the Value of Work , 2015, Nature Neuroscience.

[41]  E. Tricomi,et al.  Goals and task difficulty expectations modulate striatal responses to feedback , 2014, Cognitive, Affective, & Behavioral Neuroscience.

[42]  Jesper Andersson,et al.  Valid conjunction inference with the minimum statistic , 2005, NeuroImage.

[43]  T. Yarkoni Big Correlations in Little Studies: Inflated fMRI Correlations Reflect Low Statistical Power—Commentary on Vul et al. (2009) , 2009, Perspectives on psychological science : a journal of the Association for Psychological Science.

[44]  S. Piantadosi Learning and the language of thought , 2011 .

[45]  Karl J. Friston,et al.  Temporal Difference Models and Reward-Related Learning in the Human Brain , 2003, Neuron.

[46]  P. Glimcher Understanding dopamine and reinforcement learning: The dopamine reward prediction error hypothesis , 2011, Proceedings of the National Academy of Sciences.

[47]  Karl J. Friston,et al.  Bayesian model selection for group studies , 2009, NeuroImage.

[48]  P. Dayan,et al.  Model-based influences on humans’ choices and striatal prediction errors , 2011, Neuron.

[49]  Naoshige Uchida,et al.  Arithmetic and local circuitry underlying dopamine prediction errors , 2015, Nature.

[50]  Elizabeth Tricomi,et al.  The Value of Being Wrong: Intermittent Feedback Delivery Alters the Striatal Response to Negative Feedback , 2016, Journal of Cognitive Neuroscience.

[51]  Carol A. Seger,et al.  The Roles of the Caudate Nucleus in Human Classification Learning , 2005, The Journal of Neuroscience.

[52]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[53]  Michael L. Mack,et al.  Dynamic updating of hippocampal object representations reflects new conceptual knowledge , 2016, Proceedings of the National Academy of Sciences.

[54]  Timothy E. J. Behrens,et al.  Dissociable effects of surprise and model update in parietal and anterior cingulate cortex , 2013, Proceedings of the National Academy of Sciences.

[55]  Ian C. Ballard,et al.  Dorsolateral Prefrontal Cortex Drives Mesolimbic Dopaminergic Regions to Initiate Motivated Behavior , 2011, The Journal of Neuroscience.

[56]  Scott A. Huettel,et al.  Resting state networks distinguish human ventral tegmental area from substantia nigra , 2014, NeuroImage.

[57]  E. Koechlin,et al.  The Architecture of Cognitive Control in the Human Prefrontal Cortex , 2003, Science.

[58]  Ethan S. Bromberg-Martin,et al.  Dopamine in Motivational Control: Rewarding, Aversive, and Alerting , 2010, Neuron.

[59]  S. Haber,et al.  The Reward Circuit: Linking Primate Anatomy and Human Imaging , 2010, Neuropsychopharmacology.

[60]  Charles Kemp,et al.  How to Grow a Mind: Statistics, Structure, and Abstraction , 2011, Science.

[61]  J. Lisman,et al.  The Hippocampal-VTA Loop: Controlling the Entry of Information into Long-Term Memory , 2005, Neuron.

[62]  Noah D. Goodman Learning and the language of thought , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[63]  John R. Anderson,et al.  The role of prefrontal cortex and posterior parietal cortex in task switching. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[64]  Y. Niv Reinforcement learning in the brain , 2009 .

[65]  J. Pearce,et al.  A model for Pavlovian learning: Variations in the effectiveness of conditioned but not of unconditioned stimuli. , 1980 .

[66]  P. Glimcher,et al.  Value Representations in the Primate Striatum during Matching Behavior , 2008, Neuron.

[67]  Michael L. Waskom,et al.  Frontoparietal Representations of Task Context Support the Flexible Control of Goal-Directed Cognition , 2014, The Journal of Neuroscience.

[68]  K. Berman,et al.  Meta‐analysis of neuroimaging studies of the Wisconsin Card‐Sorting task and component processes , 2005, Human brain mapping.

[69]  Samuel M. McClure,et al.  BOLD Responses Reflecting Dopaminergic Signals in the Human Ventral Tegmental Area , 2008, Science.

[70]  P. Glimcher,et al.  Testing the Reward Prediction Error Hypothesis with an Axiomatic Model , 2010, The Journal of Neuroscience.

[71]  Karl J. Friston,et al.  Bayesian model selection for group studies — Revisited , 2014, NeuroImage.

[72]  J. Wickens,et al.  A cellular mechanism of reward-related learning , 2001, Nature.

[73]  David Badre,et al.  Functional Magnetic Resonance Imaging Evidence for a Hierarchical Organization of the Prefrontal Cortex , 2007, Journal of Cognitive Neuroscience.

[74]  M. Delgado,et al.  Reward‐Related Responses in the Human Striatum , 2007, Annals of the New York Academy of Sciences.

[75]  P. Redgrave,et al.  Is the short-latency dopamine response too short to signal reward error? , 1999, Trends in Neurosciences.