Neural Correlates of Forward Planning in a Spatial Decision Task in Humans

Although reinforcement learning (RL) theories have been influential in characterizing the mechanisms for reward-guided choice in the brain, the predominant temporal difference (TD) algorithm cannot explain many flexible or goal-directed actions that have been demonstrated behaviorally. We investigate such actions by contrasting an RL algorithm that is model based, in that it relies on learning a map or model of the task and planning within it, to traditional model-free TD learning. To distinguish these approaches in humans, we used functional magnetic resonance imaging in a continuous spatial navigation task, in which frequent changes to the layout of the maze forced subjects continually to relearn their favored routes, thereby exposing the RL mechanisms used. We sought evidence for the neural substrates of such mechanisms by comparing choice behavior and blood oxygen level-dependent (BOLD) signals to decision variables extracted from simulations of either algorithm. Both choices and value-related BOLD signals in striatum, although most often associated with TD learning, were better explained by the model-based theory. Furthermore, predecessor quantities for the model-based value computation were correlated with BOLD signals in the medial temporal lobe and frontal cortex. These results point to a significant extension of both the computational and anatomical substrates for RL in the brain.

[1]  Dearborn Animal Intelligence: An Experimental Study of the Associative Processes in Animals , 1900 .

[2]  H. Blodgett,et al.  Place versus response learning in the simple T-maze. , 1947, Journal of experimental psychology.

[3]  E. Tolman Cognitive maps in rats and men. , 1948, Psychological review.

[4]  D. Thistlethwaite A critical review of latent learning and related experiments. , 1951, Psychological bulletin.

[5]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[6]  A. Dickinson Actions and habits: the development of behavioural autonomy , 1985 .

[7]  C. Watkins Learning from delayed rewards , 1989 .

[8]  J. O’Keefe,et al.  A computational theory of the hippocampal cognitive map. , 1990, Progress in brain research.

[9]  Joel L. Davis,et al.  A Model of How the Basal Ganglia Generate and Use Neural Signals That Predict Reinforcement , 1994 .

[10]  Jun Tanji,et al.  Role for supplementary motor area cells in planning several movements ahead , 1994, Nature.

[11]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[12]  Jennifer A. Mangels,et al.  A Neostriatal Habit Learning System in Humans , 1996, Science.

[13]  J. D. McGaugh,et al.  Inactivation of Hippocampus or Caudate Nucleus with Lidocaine Differentially Affects Expression of Place and Response Learning , 1996, Neurobiology of Learning and Memory.

[14]  C R Gallistel,et al.  Computations on metric maps in mammals: getting oriented and choosing a multi-destination route. , 1996, The Journal of experimental biology.

[15]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[16]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[17]  Karl J. Friston,et al.  Generalisability, Random Effects & Population Inference , 1998, NeuroImage.

[18]  B. Balleine,et al.  Goal-directed instrumental action: contingency and incentive learning and their cortical substrates , 1998, Neuropharmacology.

[19]  Richard S. J. Frackowiak,et al.  Knowing where and getting there: a human navigation network. , 1998, Science.

[20]  Kenji Doya,et al.  What are the computations of the cerebellum, the basal ganglia and the cerebral cortex? , 1999, Neural Networks.

[21]  N. White,et al.  Parallel Information Processing in the Dorsal Striatum: Relation to Hippocampal Function , 1999, The Journal of Neuroscience.

[22]  Neil Burgess,et al.  Human spatial navigation: cognitive maps, sexual dimorphism, and neural substrates , 1999, Current Opinion in Neurobiology.

[23]  Colin Camerer,et al.  Experience‐weighted Attraction Learning in Normal Form Games , 1999 .

[24]  David J. Foster,et al.  A model of hippocampally dependent navigation, using the temporal difference learning rule , 2000, Hippocampus.

[25]  L. Nystrom,et al.  Tracking the hemodynamic responses to reward and punishment in the striatum. , 2000, Journal of neurophysiology.

[26]  Samuel M. McClure,et al.  Predictability Modulates Human Brain Response to Reward , 2001, The Journal of Neuroscience.

[27]  D. Kahneman,et al.  Functional Imaging of Neural Responses to Expectancy and Experience of Monetary Gains and Losses tasks with monetary payoffs , 2001 .

[28]  M. Gluck,et al.  Interactive memory systems in the human brain , 2001, Nature.

[29]  Brian Knutson,et al.  Dissociation of reward anticipation and outcome with event-related fMRI , 2001, Neuroreport.

[30]  Roland E. Suri,et al.  Temporal Difference Model Reproduces Anticipatory Neural Activity , 2001, Neural Computation.

[31]  B. Knowlton,et al.  Learning and memory functions of the Basal Ganglia. , 2002, Annual review of neuroscience.

[32]  J. O'Doherty,et al.  Neural Responses during Anticipation of a Primary Taste Reward , 2002, Neuron.

[33]  E. Maguire,et al.  The Human Hippocampus and Spatial and Episodic Memory , 2002, Neuron.

[34]  M. El-Sabaawi Breakdown of Will , 2002 .

[35]  P. Montague,et al.  Activity in human ventral striatum locked to errors of reward prediction , 2002, Nature Neuroscience.

[36]  B. Balleine,et al.  The Role of Learning in the Operation of Motivational Systems , 2002 .

[37]  R Turner,et al.  Optimized EPI for fMRI studies of the orbitofrontal cortex , 2003, NeuroImage.

[38]  R. Poldrack,et al.  Competition among multiple memory systems: converging evidence from animal and human brain studies , 2003, Neuropsychologia.

[39]  Samuel M. McClure,et al.  Temporal Prediction Errors in a Passive Learning Task Activate Human Striatum , 2003, Neuron.

[40]  Karl J. Friston,et al.  Temporal Difference Models and Reward-Related Learning in the Human Brain , 2003, Neuron.

[41]  M. Delgado,et al.  Dorsal striatum responses to reward and punishment: Effects of valence and magnitude manipulations , 2003, Cognitive, affective & behavioral neuroscience.

[42]  Paul J. Laurienti,et al.  An automated method for neuroanatomic and cytoarchitectonic atlas-based interrogation of fMRI data sets , 2003, NeuroImage.

[43]  H. Bergman,et al.  Information processing, dimensionality reduction and reinforcement learning in the basal ganglia , 2003, Progress in Neurobiology.

[44]  E. Maguire,et al.  The Well-Worn Route and the Path Less Traveled Distinct Neural Bases of Route Following and Wayfinding in Humans , 2003, Neuron.

[45]  Daeyeol Lee,et al.  Activity in the supplementary motor area related to learning and performance during a sequential visuomotor task. , 2003, Journal of neurophysiology.

[46]  G. Loewenstein,et al.  Animal Spirits: Affective and Deliberative Processes in Economic Behavior , 2004 .

[47]  Saori C. Tanaka,et al.  Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops , 2004, Nature Neuroscience.

[48]  Karl J. Friston,et al.  Dissociable Roles of Ventral and Dorsal Striatum in Instrumental Conditioning , 2004, Science.

[49]  B. Knowlton,et al.  Contributions of striatal subregions to place and response learning. , 2004, Learning & memory.

[50]  Samuel M. McClure,et al.  Separate Neural Systems Value Immediate and Delayed Monetary Rewards , 2004, Science.

[51]  D. Barraclough,et al.  Prefrontal cortex and decision making in a mixed-strategy game , 2004, Nature Neuroscience.

[52]  M. Delgado,et al.  Modulation of Caudate Activity by Action Contingency , 2004, Neuron.

[53]  J. O'Doherty,et al.  Reward representations and reward-related learning in the human brain: insights from neuroimaging , 2004, Current Opinion in Neurobiology.

[54]  Peter Dayan,et al.  Temporal difference models describe higher-order learning in humans , 2004, Nature.

[55]  S. Kapur,et al.  A Model of Antipsychotic Action in Conditioned Avoidance: A Computational Approach , 2004, Neuropsychopharmacology.

[56]  E. Vaadia,et al.  Independent Coding of Movement Direction and Reward Prediction by Single Pallidal Neurons , 2004, The Journal of Neuroscience.

[57]  J. Tanji,et al.  Differential roles of neuronal activity in the supplementary and presupplementary motor areas: from information retrieval to motor planning and execution. , 2004, Journal of neurophysiology.

[58]  D. Barraclough,et al.  Reinforcement learning and decision making in monkeys during a competitive game. , 2004, Brain research. Cognitive brain research.

[59]  Matthew T. Kaufman,et al.  Distributed Neural Representation of Expected Value , 2005, The Journal of Neuroscience.

[60]  K. Doya,et al.  Representation of Action-Specific Reward Values in the Striatum , 2005, Science.

[61]  T. Robbins,et al.  Neural systems of reinforcement for drug addiction: from actions to habits to compulsion , 2005, Nature Neuroscience.

[62]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[63]  Michael E. Hasselmo,et al.  A Model of Prefrontal Cortical Mechanisms for Goal-directed Behavior , 2005, Journal of Cognitive Neuroscience.

[64]  P. Glimcher,et al.  Midbrain Dopamine Neurons Encode a Quantitative Reward Prediction Error Signal , 2005, Neuron.

[65]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[66]  R. Henson What can Functional Neuroimaging Tell the Experimental Psychologist? , 2005, The Quarterly journal of experimental psychology. A, Human experimental psychology.

[67]  Timothy E. J. Behrens,et al.  Optimal decision making and the anterior cingulate cortex , 2006, Nature Neuroscience.

[68]  J. O'Doherty,et al.  Predictive Neural Coding of Reward Preference Involves Dissociable Responses in Human Ventral Midbrain and Ventral Striatum , 2006, Neuron.

[69]  Jonathan D. Cohen,et al.  Imaging valuation models in human choice. , 2006, Annual review of neuroscience.

[70]  P. Dayan,et al.  Cortical substrates for exploratory decisions in humans , 2006, Nature.

[71]  J. O'Doherty,et al.  The Role of the Ventromedial Prefrontal Cortex in Abstract State-Based Inference during Decision Making in Humans , 2006, The Journal of Neuroscience.

[72]  E. Vaadia,et al.  Midbrain dopamine neurons encode decisions for future action , 2006, Nature Neuroscience.

[73]  S. Ishii,et al.  Resolution of Uncertainty in Prefrontal Cortex , 2006, Neuron.

[74]  J. Gläscher,et al.  Dissociable Systems for Gain- and Loss-Related Value Predictions and Errors of Prediction in the Human Brain , 2006, The Journal of Neuroscience.

[75]  J. Tanji,et al.  Activity in the Lateral Prefrontal Cortex Reflects Multiple Steps of Future Events in Action Plans , 2006, Neuron.

[76]  N. Burgess,et al.  Spatial memory: how egocentric and allocentric combine , 2006, Trends in Cognitive Sciences.

[77]  B. Balleine,et al.  The Role of the Dorsal Striatum in Reward and Decision-Making , 2007, The Journal of Neuroscience.

[78]  J. O'Doherty,et al.  Model‐Based fMRI and Its Application to Reward Learning and Decision Making , 2007, Annals of the New York Academy of Sciences.

[79]  N. Daw,et al.  Reinforcement Learning Signals in the Human Striatum Distinguish Learners from Nonlearners during Reward-Based Decision Making , 2007, The Journal of Neuroscience.

[80]  Kenji Doya,et al.  Estimating Internal Variables of a Decision Maker's Brain: A Model-Based Approach for Neuroscience , 2007, ICONIP.

[81]  J. O'Doherty,et al.  Orbitofrontal Cortex Encodes Willingness to Pay in Everyday Economic Transactions , 2007, The Journal of Neuroscience.

[82]  Model-based reward prediction in the primate prefrontal cortex , 2007, Neuroscience Research.

[83]  Vivian V. Valentin,et al.  Determining the Neural Substrates of Goal-Directed Learning in the Human Brain , 2007, The Journal of Neuroscience.

[84]  Matthijs A. A. van der Meer,et al.  Integrating hippocampus and striatum in decision-making , 2007, Current Opinion in Neurobiology.

[85]  Sabrina M. Tom,et al.  The Neural Basis of Loss Aversion in Decision-Making Under Risk , 2007, Science.

[86]  R. Buckner,et al.  Opinion TRENDS in Cognitive Sciences Vol.11 No.2 Self-projection and the brain , 2022 .

[87]  P. Dayan,et al.  Differential Encoding of Losses and Gains in the Human Striatum , 2007, The Journal of Neuroscience.

[88]  Kevin McCabe,et al.  Neural signature of fictive learning signals in a sequential investment task , 2007, Proceedings of the National Academy of Sciences.

[89]  J. O'Doherty,et al.  Decoding the neural substrates of reward-related decision making with functional MRI , 2007, Proceedings of the National Academy of Sciences.

[90]  P. Glimcher,et al.  The neural correlates of subjective value during intertemporal choice , 2007, Nature Neuroscience.

[91]  J. Keller,et al.  Adaptive temporal difference learning of spatial memory in the water maze task , 2008, 2008 7th IEEE International Conference on Development and Learning.

[92]  Peter Bossaerts,et al.  Neural correlates of mentalizing-related computations during strategic interactions in humans , 2008, Proceedings of the National Academy of Sciences.

[93]  Colin Camerer,et al.  Dissociating the Role of the Orbitofrontal Cortex and the Striatum in the Computation of Goal Values and Prediction Errors , 2008, The Journal of Neuroscience.

[94]  Simon Hong,et al.  New Insights on the Subcortical Representation of Reward This Review Comes from a Themed Issue on Cognitive Neuroscience Edited Lateral Habenula Serotonin Neurons , 2022 .

[95]  N. Daw,et al.  Striatal Activity Underlies Novelty-Based Choice in Humans , 2008, Neuron.

[96]  Timothy E. J. Behrens,et al.  Choice, uncertainty and value in prefrontal and cingulate cortex , 2008, Nature Neuroscience.

[97]  Christian F. Doeller,et al.  Parallel striatal and hippocampal systems for landmarks and boundaries in spatial memory , 2008, Proceedings of the National Academy of Sciences.

[98]  Eric A. Zilli,et al.  Modeling the role of working memory and episodic memory in behavioral tasks , 2008, Hippocampus.

[99]  J. Gläscher,et al.  Determining a role for ventromedial prefrontal cortex in encoding action-based value signals during reward-related decision making. , 2009, Cerebral cortex.

[100]  W. K. Simmons,et al.  Circular analysis in systems neuroscience: the dangers of double dipping , 2009, Nature Neuroscience.

[101]  B. Balleine,et al.  Multiple Forms of Value Learning and the Function of Dopamine , 2009 .

[102]  Alain Berthoz,et al.  Sequential egocentric strategy is acquired as early as allocentric strategy: Parallel acquisition of these two navigation strategies , 2009, Hippocampus.

[103]  J. O'Doherty,et al.  Evidence for a Common Representation of Decision Values for Dissimilar Goods in Human Ventromedial Prefrontal Cortex , 2009, The Journal of Neuroscience.

[104]  Karl J. Friston,et al.  Bayesian model selection for group studies , 2009, NeuroImage.

[105]  Klaus Wunderlich,et al.  Neural computations underlying action-based decision making in the human brain , 2009, Proceedings of the National Academy of Sciences.

[106]  Karl J. Friston,et al.  False discovery rate revisited: FDR and topological inference using Gaussian random fields , 2009, NeuroImage.

[107]  Jung Hoon Sul,et al.  Role of Striatum in Updating Values of Chosen Actions , 2009, The Journal of Neuroscience.

[108]  M. Hasselmo A model of episodic memory: Mental time travel along encoded trajectories using grid cells , 2009, Neurobiology of Learning and Memory.

[109]  Ethan S. Bromberg-Martin,et al.  Midbrain Dopamine Neurons Signal Preference for Advance Information about Upcoming Rewards , 2009, Neuron.

[110]  Christian F. Doeller,et al.  Lateralized human hippocampal activity predicts navigation based on sequence or place memory , 2010, Proceedings of the National Academy of Sciences.

[111]  P. Dayan,et al.  States versus Rewards: Dissociable Neural Prediction Error Signals Underlying Model-Based and Model-Free Reinforcement Learning , 2010, Neuron.

[112]  J. Baudewig,et al.  The human parahippocampal cortex subserves egocentric spatial learning during navigation in a virtual maze , 2010, Neurobiology of Learning and Memory.

[113]  Nathaniel D. Daw,et al.  Trial-by-trial data analysis using computational models , 2011 .

[114]  Chuandong Li,et al.  Neural Information Processing , 2012, Lecture Notes in Computer Science.