Computational Properties of the Hippocampus Increase the Efficiency of Goal-Directed Foraging through Hierarchical Reinforcement Learning

The mammalian brain is thought to use a version of Model-based Reinforcement Learning (MBRL) to guide “goal-directed” behavior, wherein animals consider goals and make plans to acquire desired outcomes. However, conventional MBRL algorithms do not fully explain animals' ability to rapidly adapt to environmental changes, or learn multiple complex tasks. They also require extensive computation, suggesting that goal-directed behavior is cognitively expensive. We propose here that key features of processing in the hippocampus support a flexible MBRL mechanism for spatial navigation that is computationally efficient and can adapt quickly to change. We investigate this idea by implementing a computational MBRL framework that incorporates features inspired by computational properties of the hippocampus: a hierarchical representation of space, “forward sweeps” through future spatial trajectories, and context-driven remapping of place cells. We find that a hierarchical abstraction of space greatly reduces the computational load (mental effort) required for adaptation to changing environmental conditions, and allows efficient scaling to large problems. It also allows abstract knowledge gained at high levels to guide adaptation to new obstacles. Moreover, a context-driven remapping mechanism allows learning and memory of multiple tasks. Simulating dorsal or ventral hippocampal lesions in our computational framework qualitatively reproduces behavioral deficits observed in rodents with analogous lesions. The framework may thus embody key features of how the brain organizes model-based RL to efficiently solve navigation and other difficult tasks.

[1]  Giovanni Pezzulo,et al.  Using hippocampal-striatal loops for spatial navigation and goal-directed decision-making , 2012, Cognitive Processing.

[2]  Doina Precup,et al.  Learning Options in Reinforcement Learning , 2002, SARA.

[3]  Mehdi Keramati,et al.  Homeostatic reinforcement learning for integrating reward collection and physiological stability , 2014, eLife.

[4]  Angelo Arleo,et al.  Spatial Learning and Action Planning in a Prefrontal Cortical Network Model , 2011, PLoS Comput. Biol..

[5]  B. Balleine,et al.  Human and Rodent Homologies in Action Control: Corticostriatal Determinants of Goal-Directed and Habitual Action , 2010, Neuropsychopharmacology.

[6]  Brad E. Pfeiffer,et al.  Hippocampal place cell sequences depict future paths to remembered goals , 2013, Nature.

[7]  J. Wickens,et al.  A cellular mechanism of reward-related learning , 2001, Nature.

[8]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[9]  T. Robbins,et al.  Putting a spin on the dorsal–ventral divide of the striatum , 2004, Trends in Neurosciences.

[10]  K. Doya,et al.  Representation of Action-Specific Reward Values in the Striatum , 2005, Science.

[11]  Nils J. Nilsson,et al.  A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[12]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[13]  E. Maguire,et al.  The Well-Worn Route and the Path Less Traveled Distinct Neural Bases of Route Following and Wayfinding in Humans , 2003, Neuron.

[14]  C A Nelson,et al.  Learning to Learn , 2017, Encyclopedia of Machine Learning and Data Mining.

[15]  Adam Johnson,et al.  Neural Ensembles in CA3 Transiently Encode Paths Forward of the Animal at a Decision Point , 2007, The Journal of Neuroscience.

[16]  David C Rowland,et al.  Place cells, grid cells, and memory. , 2015, Cold Spring Harbor perspectives in biology.

[17]  Christian F. Doeller,et al.  Memory hierarchies map onto the hippocampal long axis in humans , 2015, Nature Neuroscience.

[18]  Andrew W. Moore,et al.  Prioritized sweeping: Reinforcement learning with less data and less time , 2004, Machine Learning.

[19]  R. Passingham,et al.  The Neurobiology of the Prefrontal Cortex: Anatomy, Evolution, and the Origin of Insight , 2012 .

[20]  R. J. McDonald,et al.  Attenuation of context-specific inhibition on reversal learning of a stimulus–response task in rats with neurotoxic hippocampal damage , 2002, Behavioural Brain Research.

[21]  Leonardo A. Molina,et al.  Lesions of dorsal striatum eliminate lose‐switch responding but not mixed‐response strategies in rats , 2014, The European journal of neuroscience.

[22]  James J. Knierim,et al.  Neural Population Evidence of Functional Heterogeneity along the CA3 Transverse Axis: Pattern Completion versus Pattern Separation , 2015, Neuron.

[23]  William A Roberts,et al.  Rats take correct novel routes and shortcuts in an enclosed maze. , 2007, Journal of experimental psychology. Animal behavior processes.

[24]  R. Buckner,et al.  Self-projection and the brain , 2007, Trends in Cognitive Sciences.

[25]  F. Cushman,et al.  Habitual control of goal selection in humans , 2015, Proceedings of the National Academy of Sciences.

[26]  M. Roesch,et al.  More Is Less: A Disinhibited Prefrontal Cortex Impairs Cognitive Flexibility , 2010, The Journal of Neuroscience.

[27]  Magnus Borga,et al.  Hierarchical Reinforcement Learning , 1993 .

[28]  R. Clark,et al.  The medial temporal lobe. , 2004, Annual review of neuroscience.

[29]  Uğur M Erdem,et al.  A goal‐directed spatial navigation model using forward trajectory planning based on grid cells , 2012, The European journal of neuroscience.

[30]  C. H. Donahue,et al.  Neural correlates of strategic reasoning during competitive games , 2014, Science.

[31]  A. Treves,et al.  Theta-paced flickering between place-cell maps in the hippocampus , 2011, Nature.

[32]  Mieke Verfaellie,et al.  Medial Temporal Lobe Damage Causes Deficits in Episodic Memory and Episodic Future Thinking Not Attributable to Deficits in Narrative Construction , 2011, The Journal of Neuroscience.

[33]  P. Dayan,et al.  Cortical substrates for exploratory decisions in humans , 2006, Nature.

[34]  E. Lein,et al.  Functional organization of the hippocampal longitudinal axis , 2014, Nature Reviews Neuroscience.

[35]  Boris S. Gutkin,et al.  A Reinforcement Learning Theory for Homeostatic Regulation , 2011, NIPS.

[36]  P. Dayan,et al.  A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[37]  Daeyeol Lee,et al.  Role of rodent secondary motor cortex in value-based action selection , 2011, Nature Neuroscience.

[38]  D. R. Euston,et al.  The Role of Medial Prefrontal Cortex in Memory and Decision Making , 2012, Neuron.

[39]  L. Frank,et al.  Rewarded Outcomes Enhance Reactivation of Experience in the Hippocampus , 2009, Neuron.

[40]  Ari Weinstein,et al.  Model-based hierarchical reinforcement learning and human action control , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.

[41]  Margaret F. Carr,et al.  A single microcircuit with multiple functions: state dependent information processing in the hippocampus , 2012, Current Opinion in Neurobiology.

[42]  K. Doya,et al.  Multiple representations and algorithms for reinforcement learning in the cortico-basal ganglia circuit , 2011, Current Opinion in Neurobiology.

[43]  Alexandra T. Keinath,et al.  Precise spatial coding is preserved along the longitudinal hippocampal axis , 2014, Hippocampus.

[44]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[45]  Neil Burgess,et al.  Forward and Backward Inference in Spatial Cognition , 2013, PLoS Comput. Biol..

[46]  Giovanni Pezzulo,et al.  Divide et impera: subgoaling reduces the complexity of probabilistic inference and problem solving , 2015, Journal of The Royal Society Interface.

[47]  Andrew G. Barto,et al.  Using relative novelty to identify useful temporal abstractions in reinforcement learning , 2004, ICML.

[48]  R. J. McDonald,et al.  Context, emotion, and the strategic pursuit of goals: interactions among multiple brain systems controlling motivated behavior , 2012, Front. Behav. Neurosci..

[49]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[50]  Thomas T. Hills,et al.  Model-Based Reinforcement Learning as Cognitive Search: Neurocomputational Theories , 2012 .

[51]  K. Doya Reinforcement learning: Computational theory and biological mechanisms , 2007, HFSP journal.

[52]  T. Robbins,et al.  Selective lesions of the dorsomedial striatum impair serial spatial reversal learning in rats , 2010, Behavioural Brain Research.

[53]  B. McNaughton,et al.  Reactivation of hippocampal ensemble memories during sleep. , 1994, Science.

[54]  J. D. McGaugh,et al.  Inactivation of Hippocampus or Caudate Nucleus with Lidocaine Differentially Affects Expression of Place and Response Learning , 1996, Neurobiology of Learning and Memory.

[55]  Li Lu,et al.  Topography of Place Maps along the CA3-to-CA2 Axis of the Hippocampus , 2015, Neuron.

[56]  K. Doya,et al.  Distinct Neural Representation in the Dorsolateral, Dorsomedial, and Ventral Parts of the Striatum during Fixed- and Free-Choice Tasks , 2015, The Journal of Neuroscience.

[57]  N. Daw Model-based reinforcement learning as cognitive search : Neurocomputational theories , 2012 .

[58]  Hamid Beigy,et al.  A novel graphical approach to automatic abstraction in reinforcement learning , 2013, Robotics Auton. Syst..

[59]  Diano F. Marrone,et al.  Cognitive demands induce selective hippocampal reorganization: Arc expression in a place and response task , 2012, Hippocampus.

[60]  Jozsef Csicsvari,et al.  Behavioral / Systems / Cognitive Hippocampal Place Cells Can Encode Multiple Trial-Dependent Features through Rate Remapping , 2012 .

[61]  P. Caroni,et al.  Goal-oriented searching mediated by ventral hippocampus early in trial-and-error learning , 2012, Nature Neuroscience.

[62]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[63]  T. Robbins Shifting and stopping: fronto-striatal substrates, neurochemical modulation and clinical implications , 2007, Philosophical Transactions of the Royal Society B: Biological Sciences.