Grid Cells, Place Cells, and Geodesic Generalization for Spatial Reinforcement Learning

Reinforcement learning (RL) provides an influential characterization of the brain's mechanisms for learning to make advantageous choices. An important problem, though, is how complex tasks can be represented in a way that enables efficient learning. We consider this problem through the lens of spatial navigation, examining how two of the brain's location representations—hippocampal place cells and entorhinal grid cells—are adapted to serve as basis functions for approximating value over space for RL. Although much previous work has focused on these systems' roles in combining upstream sensory cues to track location, revisiting these representations with a focus on how they support this downstream decision function offers complementary insights into their characteristics. Rather than localization, the key problem in learning is generalization between past and present situations, which may not match perfectly. Accordingly, although neural populations collectively offer a precise representation of position, our simulations of navigational tasks verify the suggestion that RL gains efficiency from the more diffuse tuning of individual neurons, which allows learning about rewards to generalize over longer distances given fewer training experiences. However, work on generalization in RL suggests the underlying representation should respect the environment's layout. In particular, although it is often assumed that neurons track location in Euclidean coordinates (that a place cell's activity declines “as the crow flies” away from its peak), the relevant metric for value is geodesic: the distance along a path, around any obstacles. We formalize this intuition and present simulations showing how Euclidean, but not geodesic, representations can interfere with RL by generalizing inappropriately across barriers. Our proposal that place and grid responses should be modulated by geodesic distances suggests novel predictions about how obstacles should affect spatial firing fields, which provides a new viewpoint on data concerning both spatial codes.

[1]  F. W. Irwin Purposive Behavior in Animals and Men , 1932, The Psychological Clinic.

[2]  Stephen J. Garland,et al.  Algorithm 97: Shortest path , 1962, Commun. ACM.

[3]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[4]  J. O'Keefe,et al.  The hippocampus as a spatial map. Preliminary evidence from unit activity in the freely-moving rat. , 1971, Brain research.

[5]  A. Siegel,et al.  A projection from the entorhinal cortex to the nucleus accumbens in the rat , 1981, Brain Research.

[6]  R. Muller,et al.  The effects of changes in the environment on the spatial firing of hippocampal complex-spike cells , 1987, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[7]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[8]  Patricia E. Sharp,et al.  Computer simulation of hippocampal place cells , 1991, Psychobiology.

[9]  J. O’Keefe,et al.  Phase relationship between hippocampal place units and the EEG theta rhythm , 1993, Hippocampus.

[10]  Peter Dayan,et al.  Improving Generalization for Temporal Difference Learning: The Successor Representation , 1993, Neural Computation.

[11]  Joel L. Davis,et al.  A Model of How the Basal Ganglia Generate and Use Neural Signals That Predict Reinforcement , 1994 .

[12]  B. McNaughton,et al.  Comparison of spatial firing characteristics of units in dorsal and ventral hippocampus of the rat , 1994, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[13]  R. Muller,et al.  On the directional firing properties of hippocampal place cells , 1994, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[14]  D. Finch,et al.  Neurophysiology and neuropharmacology of projections from entorhinal cortex to striatum in the rat , 1995, Brain Research.

[15]  P. E. Sharp,et al.  Simulation of spatial learning in the Morris water maze by a neural network model of the hippocampal formation and nucleus accumbens , 1995, Hippocampus.

[16]  Joel L. Davis,et al.  Adaptive Critics and the Basal Ganglia , 1995 .

[17]  W E Skaggs,et al.  Deciphering the hippocampal polyglot: the hippocampus as a path integration system. , 1996, The Journal of experimental biology.

[18]  H. T. Blair,et al.  Neural network modeling of the hippocampal formation spatial signals and their possible role in navigation: A modular approach , 1996, Hippocampus.

[19]  L. F. Abbott,et al.  A Model of Spatial Map Formation in the Hippocampus of the Rat , 1999, Neural Computation.

[20]  J. O’Keefe,et al.  Geometric determinants of the place fields of hippocampal neurons , 1996, Nature.

[21]  RU Muller,et al.  The hippocampus as a cognitive graph , 1996, The Journal of general physiology.

[22]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[23]  S. Totterdell,et al.  Topographical organization of projections from the entorhinal cortex to the striatum of the rat , 1997, Neuroscience.

[24]  B L McNaughton,et al.  Path Integration and Cognitive Mapping in a Continuous Attractor Neural Network Model , 1997, The Journal of Neuroscience.

[25]  D. Touretzky,et al.  Cognitive maps beyond the hippocampus , 1997, Hippocampus.

[26]  David S. Touretzky,et al.  The Role of the Hippocampus in Solving the Morris Water Maze , 1998, Neural Computation.

[27]  B. McNaughton,et al.  Spatial Firing Properties of Hippocampal CA1 Populations in an Environment Containing Two Visually Identical Regions , 1998, The Journal of Neuroscience.

[28]  A. Redish Beyond the Cognitive Map: From Place Cells to Episodic Memory , 1999 .

[29]  Kenji Doya,et al.  What are the computations of the cerebellum, the basal ganglia and the cerebral cortex? , 1999, Neural Networks.

[30]  M. Quirk,et al.  Experience-Dependent Asymmetric Shape of Hippocampal Receptive Fields , 2000, Neuron.

[31]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[32]  David J. Foster,et al.  A model of hippocampally dependent navigation, using the temporal difference learning rule , 2000, Hippocampus.

[33]  P. Best,et al.  Spatial processing in the brain: the activity of hippocampal place cells. , 2001, Annual review of neuroscience.

[34]  Roland E. Suri,et al.  Temporal Difference Model Reproduces Anticipatory Neural Activity , 2001, Neural Computation.

[35]  B. Balleine,et al.  Sensitivity to Instrumental Contingency Degradation Is Mediated by the Entorhinal Cortex and Its Efferents via the Dorsal Hippocampus , 2002, The Journal of Neuroscience.

[36]  Samuel M. McClure,et al.  A computational substrate for incentive salience , 2003, Trends in Neurosciences.

[37]  G. Buzsáki,et al.  Place Representation within Hippocampal Networks Is Modified by Long-Term Potentiation , 2003, Neuron.

[38]  Wulfram Gerstner,et al.  Learning Navigational Maps Through Potentiation and Modulation of Hippocampal Place Cells , 2004, Journal of Computational Neuroscience.

[39]  M. Fyhn,et al.  Spatial Representation in the Entorhinal Cortex , 2004, Science.

[40]  A. Louilot,et al.  Influence of the entorhinal cortex on accumbal and striatal dopaminergic responses in a latent inhibition paradigm , 2004, Neuroscience.

[41]  B. McNaughton,et al.  The contributions of position, direction, and velocity to single unit activity in the hippocampus of freely-moving rats , 1983, Experimental Brain Research.

[42]  T. S. Collett,et al.  Landmark learning and visuo-spatial memories in gerbils , 1986, Journal of Comparative Physiology A.

[43]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[44]  Michael E. Hasselmo,et al.  A Model of Prefrontal Cortical Mechanisms for Goal-directed Behavior , 2005, Journal of Cognitive Neuroscience.

[45]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[46]  A. David Redish,et al.  Hippocampal replay contributes to within session learning in a temporal difference reinforcement learning model , 2005, Neural Networks.

[47]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[48]  Sridhar Mahadevan,et al.  Value Function Approximation with Diffusion Wavelets and Laplacian Eigenfunctions , 2005, NIPS.

[49]  T. Hafting,et al.  Microstructure of a spatial map in the entorhinal cortex , 2005, Nature.

[50]  Sridhar Mahadevan,et al.  Proto-value functions: developmental reinforcement learning , 2005, ICML.

[51]  J. O’Keefe,et al.  Dual phase and rate coding in hippocampal place cells: Theoretical significance and relationship to entorhinal grid cells , 2005, Hippocampus.

[52]  Torkel Hafting,et al.  Conjunctive Representation of Position, Direction, and Velocity in Entorhinal Cortex , 2006, Science.

[53]  Roland Vollgraf,et al.  From grids to places , 2007, Journal of Computational Neuroscience.

[54]  David J. Foster,et al.  Reverse replay of behavioural sequences in hippocampal place cells during the awake state , 2006, Nature.

[55]  Bruce L. McNaughton,et al.  Path integration and the neural basis of the 'cognitive map' , 2006, Nature Reviews Neuroscience.

[56]  Simon M Stringer,et al.  Entorhinal cortex grid cells can map to hippocampal place cells by competitive learning , 2006, Network.

[57]  David S. Touretzky,et al.  Representation and Timing in Theories of the Dopamine System , 2006, Neural Computation.

[58]  G. Einevoll,et al.  From grid cells to place cells: A mathematical model , 2006, Hippocampus.

[59]  Mark C. Fuhs,et al.  A Spin Glass Model of Path Integration in Rat Medial Entorhinal Cortex , 2006, The Journal of Neuroscience.

[60]  J. O’Keefe,et al.  An oscillatory interference model of grid cell firing , 2007, Hippocampus.

[61]  Lisa M. Giocomo,et al.  Grid cell firing may arise from interference of theta frequency membrane potential oscillations in single neurons , 2007, Hippocampus.

[62]  K. Jeffery,et al.  Experience-dependent rescaling of entorhinal grids , 2007, Nature Neuroscience.

[63]  K. Jeffery Self-localization and the entorhinal–hippocampal system , 2007, Current Opinion in Neurobiology.

[64]  H. T. Blair,et al.  Scale-Invariant Memory Representations Emerge from Moiré Interference between Grid Fields That Produce Theta Oscillations: A Computational Model , 2007, The Journal of Neuroscience.

[65]  M. Hasselmo Arc length coding by interference of theta frequency oscillations may underlie context-dependent hippocampal unit data and episodic memory function. , 2007, Learning & memory.

[66]  Matthijs A. A. van der Meer,et al.  Integrating hippocampus and striatum in decision-making , 2007, Current Opinion in Neurobiology.

[67]  C. Barry,et al.  Learning in a geometric model of place cell firing , 2007, Hippocampus.

[68]  Adam Johnson,et al.  Neural Ensembles in CA3 Transiently Encode Paths Forward of the Animal at a Decision Point , 2007, The Journal of Neuroscience.

[69]  Stephen Grossberg,et al.  Space, time and learning in the hippocampus: How fine spatial and temporal scales are expanded into population codes for behavioral control , 2007, Neural Networks.

[70]  M. Moser,et al.  Representation of Geometric Borders in the Entorhinal Cortex , 2008, Science.

[71]  Richard S. Sutton,et al.  Stimulus Representation and the Timing of Reward-Prediction Errors in Models of the Dopamine System , 2008, Neural Computation.

[72]  M. Hasselmo Grid cell mechanisms and function: Contributions of entorhinal persistent spiking and phase resetting , 2008, Hippocampus.

[73]  M. Witter,et al.  What Does the Anatomical Organization of the Entorhinal Cortex Tell Us? , 2008, Neural plasticity.

[74]  P. Dayan,et al.  Decision theory, reinforcement learning, and the brain , 2008, Cognitive, affective & behavioral neuroscience.

[75]  M. Fyhn,et al.  Progressive increase in grid scale from dorsal to ventral medial entorhinal cortex , 2008, Hippocampus.

[76]  Emilio Kropff,et al.  Place cells, grid cells, and the brain's spatial representation system. , 2008, Annual review of neuroscience.

[77]  T. Hafting,et al.  Finite Scale of Spatial Representation in the Hippocampus , 2008, Science.

[78]  Mark P. Brandon,et al.  Linking Cellular Mechanisms to Behavior: Entorhinal Persistent Spiking and Membrane Potential Oscillations May Underlie Path Integration, Grid Cell Firing, and Episodic Memory , 2008, Neural plasticity.

[79]  K. Doya Modulators of decision making , 2008, Nature Neuroscience.

[80]  Ila R Fiete,et al.  What Grid Cells Convey about Rat Location , 2008, The Journal of Neuroscience.

[81]  Colin Molter,et al.  Impact of temporal coding of presynaptic entorhinal cortex grid cells on the formation of hippocampal place fields , 2008, Neural Networks.

[82]  William W Lytton,et al.  Unmasking the CA1 Ensemble Place Code by Exposures to Small and Large Environments: More Place Cells and Multiple, Irregularly Arranged, and Expanded Place Fields in the Larger Space , 2008, The Journal of Neuroscience.

[83]  Jonathan R. Whitlock,et al.  Fragmentation of grid cell maps in a multicompartment environment , 2009, Nature Neuroscience.

[84]  M. Hasselmo A model of episodic memory: Mental time travel along encoded trajectories using grid cells , 2009, Neurobiology of Learning and Memory.

[85]  M. Hasselmo,et al.  Coupled Noisy Spiking Neurons as Velocity-Controlled Oscillators in a Model of Grid Cell Spatial Firing , 2010, The Journal of Neuroscience.

[86]  Edvard I Moser,et al.  Development of the Spatial Representation System in the Rat , 2010, Science.

[87]  Thomas J. Wills,et al.  Development of the Hippocampal Cognitive Map in Preweanling Rats , 2010, Science.

[88]  George Konidaris,et al.  Value Function Approximation in Reinforcement Learning Using the Fourier Basis , 2011, AAAI.

[89]  Alice Alvernhe,et al.  Local remapping of place cell firing in the Tolman detour task , 2011, The European journal of neuroscience.

[90]  Stephen Grossberg,et al.  Grid cell hexagonal patterns formed by fast self‐organized learning within entorhinal cortex , 2012, Hippocampus.