Generalization of value in reinforcement learning by humans

Research in decision‐making has focused on the role of dopamine and its striatal targets in guiding choices via learned stimulus–reward or stimulus–response associations, behavior that is well described by reinforcement learning theories. However, basic reinforcement learning is relatively limited in scope and does not explain how learning about stimulus regularities or relations may guide decision‐making. A candidate mechanism for this type of learning comes from the domain of memory, which has highlighted a role for the hippocampus in learning of stimulus–stimulus relations, typically dissociated from the role of the striatum in stimulus–response learning. Here, we used functional magnetic resonance imaging and computational model‐based analyses to examine the joint contributions of these mechanisms to reinforcement learning. Humans performed a reinforcement learning task with added relational structure, modeled after tasks used to isolate hippocampal contributions to memory. On each trial participants chose one of four options, but the reward probabilities for pairs of options were correlated across trials. This (uninstructed) relationship between pairs of options potentially enabled an observer to learn about option values based on experience with the other options and to generalize across them. We observed blood oxygen level‐dependent (BOLD) activity related to learning in the striatum and also in the hippocampus. By comparing a basic reinforcement learning model to one augmented to allow feedback to generalize between correlated options, we tested whether choice behavior and BOLD activity were influenced by the opportunity to generalize across correlated options. Although such generalization goes beyond standard computational accounts of reinforcement learning and striatal BOLD, both choices and striatal BOLD activity were better explained by the augmented model. Consistent with the hypothesized role for the hippocampus in this generalization, functional connectivity between the ventral striatum and hippocampus was modulated, across participants, by the ability of the augmented model to capture participants’ choice. Our results thus point toward an interactive model in which striatal reinforcement learning systems may employ relational representations typically associated with the hippocampus.

[1]  B. Falck,et al.  On the cellular localization of catechol amines in the brain. , 1959, Acta anatomica.

[2]  D. McFadden Conditional logit analysis of qualitative choice behavior , 1972 .

[3]  H. Akaike A new look at the statistical model identification , 1974 .

[4]  L. Swanson,et al.  The projections of the ventral tegmental area and adjacent regions: A combined fluorescent retrograde tracer and immunofluorescence study in the rat , 1982, Brain Research Bulletin.

[5]  A. Kelley,et al.  The distribution of the projection from the hippocampal formation to the nucleus accumbens in the rat: An anterograde and retrograde-horseradish peroxidase study , 1982, Neuroscience.

[6]  R. C. Honey,et al.  Acquired equivalence and distinctiveness of cues. , 1989, Journal of experimental psychology. Animal behavior processes.

[7]  D. Schacter Perceptual Representation Systems and Implicit Memory , 1990, Annals of the New York Academy of Sciences.

[8]  U. Frey,et al.  Dopaminergic antagonists prevent long-term maintenance of posttetanic LTP in the CA1 region of rat hippocampal slices , 1990, Brain Research.

[9]  L. Squire Memory and the hippocampus: a synthesis from findings with rats, monkeys, and humans. , 1992, Psychological review.

[10]  M. Gluck,et al.  Hippocampal mediation of stimulus representation: A computational theory , 1993, Hippocampus.

[11]  Joel L. Davis,et al.  A Model of How the Basal Ganglia Generate and Use Neural Signals That Predict Reinforcement , 1994 .

[12]  Karl J. Friston,et al.  Assessing the significance of focal activations using their spatial extent , 1994, Human brain mapping.

[13]  C. Verney,et al.  Mesolimbic dopaminergic neurons innervating the hippocampal formation in the rat: a combined retrograde tracing and immunohistochemical study , 1994, Brain Research.

[14]  J. Hodges Memory, Amnesia and the Hippocampal System , 1995 .

[15]  E. Kandel,et al.  D1/D5 receptor agonists induce a protein synthesis-dependent late potentiation in the CA1 region of the hippocampus. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[16]  J. Lisman,et al.  D1/D5 Dopamine Receptor Activation Increases the Magnitude of Early Long-Term Potentiation at CA1 Hippocampal Synapses , 1996, The Journal of Neuroscience.

[17]  Jennifer A. Mangels,et al.  A Neostriatal Habit Learning System in Humans , 1996, Science.

[18]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[19]  Karl J. Friston,et al.  Psychophysiological and Modulatory Interactions in Neuroimaging , 1997, NeuroImage.

[20]  D H Brainard,et al.  The Psychophysics Toolbox. , 1997, Spatial vision.

[21]  H. Eichenbaum,et al.  The hippocampus and memory for orderly stimulus relations. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[22]  J. Gabrieli Cognitive neuroscience of human memory. , 1998, Annual review of psychology.

[23]  Karl J. Friston,et al.  Nonlinear event‐related responses in fMRI , 1998, Magnetic resonance in medicine.

[24]  Karl J. Friston,et al.  Generalisability, Random Effects & Population Inference , 1998, NeuroImage.

[25]  Kenji Doya,et al.  What are the computations of the cerebellum, the basal ganglia and the cerebral cortex? , 1999, Neural Networks.

[26]  Michael L. Platt,et al.  Neural correlates of decision variables in parietal cortex , 1999, Nature.

[27]  Colin Camerer,et al.  Experience‐weighted Attraction Learning in Normal Form Games , 1999 .

[28]  N. Cohen From Conditioning to Conscious Recollection Memory Systems of the Brain. Oxford Psychology Series, Volume 35. , 2001 .

[29]  M. Gluck,et al.  Interactive memory systems in the human brain , 2001, Nature.

[30]  Brian Knutson,et al.  Anticipation of Increasing Monetary Reward Selectively Recruits Nucleus Accumbens , 2001, The Journal of Neuroscience.

[31]  N. Tzourio-Mazoyer,et al.  Automated Anatomical Labeling of Activations in SPM Using a Macroscopic Anatomical Parcellation of the MNI MRI Single-Subject Brain , 2002, NeuroImage.

[32]  P. Montague,et al.  Activity in human ventral striatum locked to errors of reward prediction , 2002, Nature Neuroscience.

[33]  R. C. Honey,et al.  Acquired equivalence and distinctiveness of cues: II. Neural manipulations and their implications. , 2002, Journal of experimental psychology. Animal behavior processes.

[34]  R Turner,et al.  Optimized EPI for fMRI studies of the orbitofrontal cortex , 2003, NeuroImage.

[35]  Samuel M. McClure,et al.  Temporal Prediction Errors in a Passive Learning Task Activate Human Striatum , 2003, Neuron.

[36]  Karl J. Friston,et al.  Temporal Difference Models and Reward-Related Learning in the Human Brain , 2003, Neuron.

[37]  M. Gluck,et al.  Dissociating Hippocampal versus Basal Ganglia Contributions to Learning and Transfer , 2003, Journal of Cognitive Neuroscience.

[38]  P. Glimcher,et al.  Activity in Posterior Parietal Cortex Is Correlated with the Relative Subjective Desirability of Action , 2004, Neuron.

[39]  W. Newsome,et al.  Matching Behavior and the Representation of Value in the Parietal Cortex , 2004, Science.

[40]  Michael J. Frank,et al.  By Carrot or by Stick: Cognitive Reinforcement Learning in Parkinsonism , 2004, Science.

[41]  Alison R Preston,et al.  Hippocampal contribution to the novel use of relational information in declarative memory , 2004, Hippocampus.

[42]  E. Kandel,et al.  D 1 / D 5 receptor agonists induce a protein synthesis-dependent late potentiation in the CAl region of the hippocampus , 2005 .

[43]  J. Tenenbaum,et al.  Structure and strength in causal induction , 2005, Cognitive Psychology.

[44]  T. Robbins,et al.  Neural systems of reinforcement for drug addiction: from actions to habits to compulsion , 2005, Nature Neuroscience.

[45]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[46]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[47]  N. Burgess,et al.  Complementary memory systems: competition, cooperation and compensation , 2005, Trends in Neurosciences.

[48]  H. Seung,et al.  JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR 2005, 84, 581–617 NUMBER 3(NOVEMBER) LINEAR-NONLINEAR-POISSON MODELS OF PRIMATE CHOICE DYNAMICS , 2022 .

[49]  S. Inati,et al.  An fMRI study of reward-related probability learning , 2005, NeuroImage.

[50]  Jesper Andersson,et al.  Valid conjunction inference with the minimum statistic , 2005, NeuroImage.

[51]  K. Fuxe,et al.  Localization of monoamines in the lower brain stem , 1964, Experientia.

[52]  P. Glimcher,et al.  JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR 2005, 84, 555–579 NUMBER 3(NOVEMBER) DYNAMIC RESPONSE-BY-RESPONSE MODELS OF MATCHING BEHAVIOR IN RHESUS MONKEYS , 2022 .

[53]  Y. Lacasse,et al.  From the authors , 2005, European Respiratory Journal.

[54]  R. Deichmann,et al.  Optimized EPI for fMRI studies of the orbitofrontal cortex: compensation of susceptibility-induced gradients in the readout direction , 2007, Magnetic Resonance Materials in Physics, Biology and Medicine.

[55]  L. Davachi Item, context and relational episodic encoding in humans , 2006, Current Opinion in Neurobiology.

[56]  Brian Knutson,et al.  Linking nucleus accumbens dopamine and blood oxygenation , 2007, Psychopharmacology.

[57]  J. O'Doherty,et al.  Is Avoiding an Aversive Outcome Rewarding? Neural Substrates of Avoidance Learning in the Human Brain , 2006, PLoS biology.

[58]  M. Gluck,et al.  l-dopa impairs learning, but spares generalization, in Parkinson's disease , 2006, Neuropsychologia.

[59]  P. Dayan,et al.  Cortical substrates for exploratory decisions in humans , 2006, Nature.

[60]  R. Dolan,et al.  Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans , 2006, Nature.

[61]  Brian Knutson,et al.  Reward-Motivated Learning: Mesolimbic Activation Precedes Memory Formation , 2006, Neuron.

[62]  J. O'Doherty,et al.  The Role of the Ventromedial Prefrontal Cortex in Abstract State-Based Inference during Decision Making in Humans , 2006, The Journal of Neuroscience.

[63]  Russell A Poldrack,et al.  Modulation of competing memory systems by distraction. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[64]  W. Schultz Behavioral theories and the neurophysiology of reward. , 2006, Annual review of psychology.

[65]  K. Doya,et al.  The computational neurobiology of learning and reward , 2006, Current Opinion in Neurobiology.

[66]  William L. Gross,et al.  An fMRI Analysis of the Human Hippocampus: Inference, Context, and Task Awareness , 2006, Journal of Cognitive Neuroscience.

[67]  Aaron C. Courville,et al.  Bayesian theories of conditioning in a changing world , 2006, Trends in Cognitive Sciences.

[68]  J. O'Doherty,et al.  Model‐Based fMRI and Its Application to Reward Learning and Decision Making , 2007, Annals of the New York Academy of Sciences.

[69]  N. Daw,et al.  Reinforcement Learning Signals in the Human Striatum Distinguish Learners from Nonlearners during Reward-Based Decision Making , 2007, The Journal of Neuroscience.

[70]  J. O'Doherty,et al.  Orbitofrontal Cortex Encodes Willingness to Pay in Everyday Economic Transactions , 2007, The Journal of Neuroscience.

[71]  Matthijs A. A. van der Meer,et al.  Integrating hippocampus and striatum in decision-making , 2007, Current Opinion in Neurobiology.

[72]  Kevin McCabe,et al.  Neural signature of fictive learning signals in a sequential investment task , 2007, Proceedings of the National Academy of Sciences.

[73]  M. Gluck,et al.  Basal ganglia and dopamine contributions to probabilistic category learning , 2008, Neuroscience & Biobehavioral Reviews.

[74]  Colin Camerer,et al.  A framework for studying the neurobiology of value-based decision making , 2008, Nature Reviews Neuroscience.

[75]  Peter Bossaerts,et al.  Neural correlates of mentalizing-related computations during strategic interactions in humans , 2008, Proceedings of the National Academy of Sciences.

[76]  D. Shohamy,et al.  Integrating Memories in the Human Brain: Hippocampal-Midbrain Encoding of Overlapping Events , 2008, Neuron.

[77]  H. Heinze,et al.  Mesolimbic Functional Magnetic Resonance Imaging Activations during Reward Anticipation Correlate with Reward-Related Ventral Striatal Dopamine Release , 2008, The Journal of Neuroscience.

[78]  Colin Camerer,et al.  Dissociating the Role of the Orbitofrontal Cortex and the Striatum in the Computation of Goal Values and Prediction Errors , 2008, The Journal of Neuroscience.

[79]  N. Daw,et al.  Striatal Activity Underlies Novelty-Based Choice in Humans , 2008, Neuron.

[80]  Charles Kemp,et al.  The discovery of structural form , 2008, Proceedings of the National Academy of Sciences.

[81]  P. Dayan,et al.  tHe Cognitive neuroSCienCe of Motivation and learning , 2008 .

[82]  Adam Johnson,et al.  Computing motivation: Incentive salience boosts of drug or appetite states , 2008, Behavioral and Brain Sciences.

[83]  D. Hassabis,et al.  Tracking the Emergence of Conceptual Knowledge during Human Decision Making , 2009, Neuron.

[84]  B. Balleine,et al.  Multiple Forms of Value Learning and the Function of Dopamine , 2009 .

[85]  V. Michel,et al.  An Automatic Valuation System in the Human Brain: Evidence from Functional Neuroimaging , 2009, Neuron.

[86]  Timothy Edward John Behrens,et al.  How Green Is the Grass on the Other Side? Frontopolar Cortex and the Evidence in Favor of Alternative Courses of Action , 2009, Neuron.

[87]  Karl J. Friston,et al.  Bayesian model selection for group studies , 2009, NeuroImage.

[88]  M. Pessiglione,et al.  Brain Hemispheres Selectively Track the Expected Value of Contralateral Options , 2009, The Journal of Neuroscience.

[89]  Michael X. Cohen,et al.  Connectivity-based segregation of the human striatum predicts personality characteristics , 2009, Nature Neuroscience.

[90]  Catherine E. Myers,et al.  A neurocomputational model of classical conditioning phenomena: A putative role for the hippocampal region in associative learning , 2009, Brain Research.

[91]  N. Daw,et al.  Human Reinforcement Learning Subdivides Structured Action Spaces by Learning Effector-Specific Values , 2009, The Journal of Neuroscience.

[92]  B. Staresina,et al.  Mind the Gap: Binding Experiences across Space and Time in the Human Hippocampus , 2009, Neuron.

[93]  Deborah E. Hannula,et al.  The Eyes Have It: Hippocampal Activity Predicts Expression of Memory in Eye Movements , 2009, Neuron.

[94]  Dagmar Zeithamova,et al.  Flexible Memories: Differential Roles for Medial Temporal Lobe and Prefrontal Cortex in Cross-Episode Binding , 2010, The Journal of Neuroscience.

[95]  Adam Johnson,et al.  Triple Dissociation of Information Processing in Dorsal Striatum, Ventral Striatum, and Hippocampus on a Learned Spatial Decision Task , 2010, Neuron.

[96]  B. Hayden,et al.  Distinct Value Signals in Anterior and Posterior Ventromedial Prefrontal Cortex , 2010, The Journal of Neuroscience.

[97]  Nathaniel D. Daw,et al.  Selective impairment of prediction error signaling in human dorsolateral but not ventral striatum in Parkinson's disease patients: evidence from a model-based fMRI study , 2010, NeuroImage.

[98]  D. Shohamy,et al.  Dopamine and adaptive memory , 2010, Trends in Cognitive Sciences.

[99]  R. Buckner The role of the hippocampus in prediction and imagination. , 2010, Annual review of psychology.

[100]  Brice A. Kuhl,et al.  Resistance to forgetting associated with hippocampus-mediated reactivation during new learning , 2010, Nature Neuroscience.

[101]  Simon Hong,et al.  A pallidus-habenula-dopamine pathway signals inferred stimulus values. , 2010, Journal of neurophysiology.

[102]  Nathaniel D. Daw,et al.  Trial-by-trial data analysis using computational models , 2011 .

[103]  T. Robbins,et al.  Decision Making, Affect, and Learning: Attention and Performance XXIII , 2011 .

[104]  Dylan A. Simon,et al.  Neural Correlates of Forward Planning in a Spatial Decision Task in Humans , 2011, The Journal of Neuroscience.

[105]  Jian Li,et al.  Parallel contributions of distinct human memory systems during probabilistic learning , 2011, NeuroImage.

[106]  P. Dayan,et al.  Model-based influences on humans’ choices and striatal prediction errors , 2011, Neuron.

[107]  N. Daw,et al.  Signals in Human Striatum Are Appropriate for Policy Update Rather than Value Prediction , 2011, The Journal of Neuroscience.

[108]  C. Stark,et al.  Striatal and medial temporal lobe functional interactions during visuomotor associative learning. , 2011, Cerebral cortex.

[109]  N. Daw,et al.  Dissociating hippocampal and striatal contributions to sequential prediction learning , 2012, The European journal of neuroscience.