Model-based spatial navigation in the hippocampus-ventral striatum circuit: A computational analysis

While the neurobiology of simple and habitual choices is relatively well known, our current understanding of goal-directed choices and planning in the brain is still limited. Theoretical work suggests that goal-directed computations can be productively associated to model-based (reinforcement learning) computations, yet a detailed mapping between computational processes and neuronal circuits remains to be fully established. Here we report a computational analysis that aligns Bayesian nonparametrics and model-based reinforcement learning (MB-RL) to the functioning of the hippocampus (HC) and the ventral striatum (vStr)–a neuronal circuit that increasingly recognized to be an appropriate model system to understand goal-directed (spatial) decisions and planning mechanisms in the brain. We test the MB-RL agent in a contextual conditioning task that depends on intact hippocampus and ventral striatal (shell) function and show that it solves the task while showing key behavioral and neuronal signatures of the HC—vStr circuit. Our simulations also explore the benefits of biological forms of look-ahead prediction (forward sweeps) during both learning and control. This article thus contributes to fill the gap between our current understanding of computational algorithms and biological realizations of (model-based) reinforcement learning.

[1]  David Marr,et al.  VISION A Computational Investigation into the Human Representation and Processing of Visual Information , 2009 .

[2]  Karl J. Friston,et al.  Hierarchical Active Inference: A Theory of Motivated Control , 2018, Trends in Cognitive Sciences.

[3]  B. McNaughton,et al.  Reactivation of hippocampal ensemble memories during sleep. , 1994, Science.

[4]  Karl J. Friston,et al.  Active Inference, Curiosity and Insight , 2017, Neural Computation.

[5]  Matthijs A. A. van der Meer,et al.  Internally generated sequences in learning and executing goal-directed behavior , 2014, Trends in Cognitive Sciences.

[6]  G. Buzsáki Hippocampal sharp wave‐ripple: A cognitive biomarker for episodic memory and planning , 2015, Hippocampus.

[7]  Hugo J. Spiers,et al.  Solving the detour problem in navigation: a model of prefrontal and hippocampal interactions , 2015, Front. Hum. Neurosci..

[8]  J. G. Taylor,et al.  Vicarious trial and error. , 1951, Psychological review.

[9]  Giovanni Pezzulo,et al.  Divide et impera: subgoaling reduces the complexity of probabilistic inference and problem solving , 2015, Journal of The Royal Society Interface.

[10]  Bruce L McNaughton,et al.  Selective excitotoxic lesions of the hippocampus and basolateral amygdala have dissociable effects on appetitive cue and place conditioning based on path integration in a novel Y‐maze procedure , 2006, The European journal of neuroscience.

[11]  Alec Solway,et al.  Optimal Behavioral Hierarchy , 2014, PLoS Comput. Biol..

[12]  Joel L. Davis,et al.  A Model of How the Basal Ganglia Generate and Use Neural Signals That Predict Reinforcement , 1994 .

[13]  E. Koechlin,et al.  Reasoning, Learning, and Creativity: Frontal Lobe Function and Human Decision-Making , 2012, PLoS biology.

[14]  Karl J. Friston,et al.  Action perception as hypothesis testing , 2017, Cortex.

[15]  G. Pezzulo,et al.  The Value of Foresight: How Prospection Affects Decision-Making , 2011, Front. Neurosci..

[16]  J. O'Keefe,et al.  The hippocampus as a spatial map. Preliminary evidence from unit activity in the freely-moving rat. , 1971, Brain research.

[17]  安藤 広志,et al.  20世紀の名著名論:David Marr:Vision:a Computational Investigation into the Human Representation and Processing of Visual Information , 2005 .

[18]  Bruno Poucet,et al.  Goal-Related Activity in Hippocampal Place Cells , 2007, The Journal of Neuroscience.

[19]  Giovanni Pezzulo,et al.  Problem Solving as Probabilistic Inference with Subgoaling: Explaining Human Successes and Pitfalls in the Tower of Hanoi , 2016, PLoS Comput. Biol..

[20]  G. Buzsáki,et al.  Forward and reverse hippocampal place-cell sequences during ripples , 2007, Nature Neuroscience.

[21]  Matthijs A. A. van der Meer,et al.  Expectancies in Decision Making, Reinforcement Learning, and Ventral Striatum , 2009, Frontiers in neuroscience.

[22]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[23]  M. Wilson,et al.  Oscillations, neural computations and learning during wake and sleep , 2017, Current Opinion in Neurobiology.

[24]  Matthijs A. A. van der Meer,et al.  Frontiers in Integrative Neuroscience Integrative Neuroscience Covert Expectation-of-reward in Rat Ventral Striatum at Decision Points , 2022 .

[25]  G. Einevoll,et al.  From grid cells to place cells: A mathematical model , 2006, Hippocampus.

[26]  Matthijs A. A. van der Meer,et al.  Information Processing in Decision-Making Systems , 2012, The Neuroscientist : a review journal bringing neurobiology, neurology and psychiatry.

[27]  M. Hasselmo,et al.  A biologically inspired hierarchical goal directed navigation model , 2014, Journal of Physiology-Paris.

[28]  G. Pezzulo,et al.  Navigating the Affordance Landscape: Feedback Control as a Process Model of Behavior and Cognition , 2016, Trends in Cognitive Sciences.

[29]  Amir Dezfouli,et al.  Speed/Accuracy Trade-Off between the Habitual and the Goal-Directed Processes , 2011, PLoS Comput. Biol..

[30]  Karl J. Friston,et al.  Active Inference: A Process Theory , 2017, Neural Computation.

[31]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[32]  Brad E. Pfeiffer,et al.  Reverse Replay of Hippocampal Place Cells Is Uniquely Modulated by Changing Reward , 2016, Neuron.

[33]  C. Pennartz,et al.  Is a bird in the hand worth two in the future? The neuroeconomics of intertemporal decision-making , 2008, Progress in Neurobiology.

[34]  E. Lein,et al.  Functional organization of the hippocampal longitudinal axis , 2014, Nature Reviews Neuroscience.

[35]  Laurenz Wiskott,et al.  Slowness and Sparseness Lead to Place, Head-Direction, and Spatial-View Cells , 2007, PLoS Comput. Biol..

[36]  James L. McClelland,et al.  What Learning Systems do Intelligent Agents Need? Complementary Learning Systems Theory Updated , 2016, Trends in Cognitive Sciences.

[37]  James L. McClelland,et al.  Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. , 1995, Psychological review.

[38]  E. Save,et al.  Coding for spatial goals in the prelimbic/infralimbic area of the rat frontal cortex. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[39]  Brad E. Pfeiffer,et al.  Hippocampal place cell sequences depict future paths to remembered goals , 2013, Nature.

[40]  P. Dayan,et al.  Goals and Habits in the Brain , 2013, Neuron.

[41]  Lisa M. Giocomo,et al.  Computational Models of Grid Cells , 2011, Neuron.

[42]  T. Robbins,et al.  The hippocampal–striatal axis in learning, prediction and goal-directed behavior , 2011, Trends in Neurosciences.

[43]  Y. Niv,et al.  Ventral Striatum and Orbitofrontal Cortex Are Both Required for Model-Based, But Not Model-Free, Reinforcement Learning , 2011, The Journal of Neuroscience.

[44]  Paul F. M. J. Verschure,et al.  The why, what, where, when and how of goal-directed choice: neuronal and computational principles , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.

[45]  Roddy M. Grieves,et al.  Lesions of the Head Direction Cell System Increase Hippocampal Place Field Repetition , 2017, Current Biology.

[46]  Bruce L. McNaughton,et al.  Path integration and the neural basis of the 'cognitive map' , 2006, Nature Reviews Neuroscience.

[47]  Karl J. Friston,et al.  Active inference and epistemic value , 2015, Cognitive neuroscience.

[48]  Samuel J. Gershman,et al.  A Tutorial on Bayesian Nonparametric Models , 2011, 1106.2697.

[49]  Giovanni Pezzulo,et al.  Nonparametric Problem-Space Clustering: Learning Efficient Codes for Cognitive Control Tasks , 2016, Entropy.

[50]  Giovanni Pezzulo,et al.  Using hippocampal-striatal loops for spatial navigation and goal-directed decision-making , 2012, Cognitive Processing.

[51]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[52]  Shigeyoshi Fujisawa,et al.  Temporal and Rate Coding for Discrete Event Sequences in the Hippocampus , 2017, Neuron.

[53]  B. McNaughton,et al.  Hippocampus Leads Ventral Striatum in Replay of Place-Reward Information , 2009, PLoS biology.

[54]  Kevin J. Miller,et al.  Dorsal hippocampus contributes to model-based planning , 2017, Nature Neuroscience.

[55]  Etienne Koechlin,et al.  Foundations of human reasoning in the prefrontal cortex , 2014, Science.

[56]  Karl J. Friston,et al.  Neuroscience and Biobehavioral Reviews , 2022 .

[57]  Karl J. Friston,et al.  Active Inference, epistemic value, and vicarious trial and error , 2016, Learning & memory.

[58]  M. Botvinick,et al.  The successor representation in human reinforcement learning , 2016, Nature Human Behaviour.

[59]  Gerald Tesauro,et al.  On-line Policy Improvement using Monte-Carlo Search , 1996, NIPS.

[60]  Alec Solway,et al.  Goal-directed decision making as probabilistic inference: a computational framework and potential neural correlates. , 2012, Psychological review.

[61]  Marcelo G Mattar,et al.  Prioritized memory access explains planning and hippocampal replay , 2017, Nature Neuroscience.

[62]  P. Dayan,et al.  Model-based influences on humans’ choices and striatal prediction errors , 2011, Neuron.

[63]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[64]  Ari Weinstein,et al.  Model-based hierarchical reinforcement learning and human action control , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.

[65]  Adam Johnson,et al.  Neural Ensembles in CA3 Transiently Encode Paths Forward of the Animal at a Decision Point , 2007, The Journal of Neuroscience.

[66]  Samuel Gershman,et al.  Predictive representations can link model-based reinforcement learning to model-free mechanisms , 2017, bioRxiv.

[67]  Christian F. Doeller,et al.  Memory hierarchies map onto the hippocampal long axis in humans , 2015, Nature Neuroscience.

[68]  Mark C. Fuhs,et al.  A Spin Glass Model of Path Integration in Rat Medial Entorhinal Cortex , 2006, The Journal of Neuroscience.

[69]  Bruno Poucet,et al.  Prefrontal Cortex Focally Modulates Hippocampal Place Cell Firing Patterns , 2013, The Journal of Neuroscience.

[70]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[71]  T. Robbins,et al.  Functional Interaction between the Hippocampus and Nucleus Accumbens Shell Is Necessary for the Acquisition of Appetitive Spatial Context Conditioning , 2008, The Journal of Neuroscience.

[72]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[73]  R. O’Reilly,et al.  Deep Predictive Learning: A Comprehensive Model of Three Visual Streams , 2017, 1709.04654.

[74]  T. Hafting,et al.  Finite Scale of Spatial Representation in the Hippocampus , 2008, Science.

[75]  J. O’Keefe,et al.  An oscillatory interference model of grid cell firing , 2007, Hippocampus.

[76]  Giovanni Pezzulo,et al.  Mental imagery in the navigation domain: a computational model of sensory-motor simulation mechanisms , 2013, Adapt. Behav..

[77]  W. Schultz,et al.  Neuronal activity in monkey ventral striatum related to the expectation of reward , 1992, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[78]  M. Botvinick Hierarchical models of behavior and prefrontal function , 2008, Trends in Cognitive Sciences.

[79]  David J. Foster,et al.  Reverse replay of behavioural sequences in hippocampal place cells during the awake state , 2006, Nature.

[80]  P. Dayan,et al.  The algorithmic anatomy of model-based evaluation , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.

[81]  Adam N Sanborn,et al.  Rational approximations to rational models: alternative algorithms for category learning. , 2010, Psychological review.

[82]  A D Redish,et al.  Prediction, sequences and the hippocampus , 2009, Philosophical Transactions of the Royal Society B: Biological Sciences.

[83]  Vivian V. Valentin,et al.  Determining the Neural Substrates of Goal-Directed Learning in the Human Brain , 2007, The Journal of Neuroscience.

[84]  David J. Foster,et al.  Hippocampal theta sequences , 2007, Hippocampus.

[85]  G. Pezzulo,et al.  Internally generated hippocampal sequences as a vantage point to probe future‐oriented cognition , 2017, Annals of the New York Academy of Sciences.

[86]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[87]  Andrew M. Wikenheiser,et al.  Hippocampal theta sequences reflect current goals , 2015, Nature Neuroscience.

[88]  M. Botvinick,et al.  The hippocampus as a predictive map , 2016 .

[89]  A. Redish,et al.  The Mind within the Brain: How We Make Decisions and How those Decisions Go Wrong , 2013 .

[90]  Giovanni Pezzulo,et al.  Prefrontal Goal Codes Emerge as Latent States in Probabilistic Value Learning , 2016, Journal of Cognitive Neuroscience.

[91]  G. Buzsáki,et al.  Temporal Encoding of Place Sequences by Hippocampal Cell Assemblies , 2006, Neuron.

[92]  Peter Dayan,et al.  Improving Generalization for Temporal Difference Learning: The Successor Representation , 1993, Neural Computation.

[93]  M. Botvinick,et al.  Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective , 2009, Cognition.

[94]  Giovanni Pezzulo,et al.  Model-Based Approaches to Active Perception and Control , 2017, Entropy.

[95]  Neil Burgess,et al.  Forward and Backward Inference in Spatial Cognition , 2013, PLoS Comput. Biol..

[96]  Giovanni Pezzulo,et al.  The Mixed Instrumental Controller: Using Value of Information to Combine Habitual Choice and Mental Simulation , 2013, Front. Psychol..

[97]  Anne G E Collins,et al.  Cognitive control over learning: creating, clustering, and generalizing task-set structure. , 2013, Psychological review.

[98]  Karl J. Friston Hierarchical Models in the Brain , 2008, PLoS Comput. Biol..

[99]  Jadin C. Jackson,et al.  Reward Cues in Space: Commonalities and Differences in Neural Coding by Hippocampal and Ventral Striatal Ensembles , 2012, The Journal of Neuroscience.

[100]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[101]  Robert U Muller,et al.  Head direction cells: properties and functional significance , 1996, Current Opinion in Neurobiology.

[102]  A. Redish,et al.  A functional difference in information processing between orbitofrontal cortex and ventral striatum during decision-making behaviour , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.