A model of hippocampally dependent navigation, using the temporal difference learning rule

This paper presents a model of how hippocampal place cells might be used for spatial navigation in two watermaze tasks: the standard reference memory task and a delayed matching‐to‐place task. In the reference memory task, the escape platform occupies a single location and rats gradually learn relatively direct paths to the goal over the course of days, in each of which they perform a fixed number of trials. In the delayed matching‐to‐place task, the escape platform occupies a novel location on each day, and rats gradually acquire one‐trial learning, i.e., direct paths on the second trial of each day. The model uses a local, incremental, and statistically efficient connectionist algorithm called temporal difference learning in two distinct components. The first is a reinforcement‐based “actor‐critic” network that is a general model of classical and instrumental conditioning. In this case, it is applied to navigation, using place cells to provide information about state. By itself, the actor‐critic can learn the reference memory task, but this learning is inflexible to changes to the platform location. We argue that one‐trial learning in the delayed matching‐to‐place task demands a goal‐independent representation of space. This is provided by the second component of the model: a network that uses temporal difference learning and self‐motion information to acquire consistent spatial coordinates in the environment. Each component of the model is necessary at a different stage of the task; the actor‐critic provides a way of transferring control to the component that performs best. The model successfully captures gradual acquisition in both tasks, and, in particular, the ultimate development of one‐trial learning in the delayed matching‐to‐place task. Place cells report a form of stable, allocentric information that is well‐suited to the various kinds of learning in the model. Hippocampus 2000;10:1–16. © 2000 Wiley‐Liss, Inc.

[1]  J. O'Keefe,et al.  The hippocampus as a spatial map. Preliminary evidence from unit activity in the freely-moving rat. , 1971, Brain research.

[2]  E. Menzel Chimpanzee Spatial Memory Organization , 1973, Science.

[3]  Ian H. Witten,et al.  An Adaptive Optimal Controller for Discrete-Time Markov Environments , 1977, Inf. Control..

[4]  L. Nadel,et al.  The Hippocampus as a Cognitive Map , 1978 .

[5]  C. Barnes Memory deficits associated with senescence: a neurophysiological and behavioral study in the rat. , 1979, Journal of comparative and physiological psychology.

[6]  R. Passingham The hippocampus as a cognitive map J. O'Keefe & L. Nadel, Oxford University Press, Oxford (1978). 570 pp., £25.00 , 1979, Neuroscience.

[7]  A G Barto,et al.  Toward a modern theory of adaptive networks: expectation and prediction. , 1981, Psychological review.

[8]  R. Morris Spatial Localization Does Not Require the Presence of Local Cues , 1981 .

[9]  R. Morris,et al.  Place navigation impaired in rats with hippocampal lesions , 1982, Nature.

[10]  R. Sutherland,et al.  A behavioural analysis of spatial localization following electrolytic, kainate- or colchicine-induced damage to the hippocampal formation in the rat , 1983, Behavioural Brain Research.

[11]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[12]  O. Burešová,et al.  Persistence of spatial memory in the Morris water tank task. , 1984, International journal of psychophysiology : official journal of the International Organization of Psychophysiology.

[13]  Ian Q. Whishaw,et al.  Formation of a place learning-set by the rat: A new paradigm for neurobehavioral studies , 1985, Physiology & Behavior.

[14]  D. Zipser,et al.  Biologically plausible models of place recognition and goal location , 1986 .

[15]  J. B. Ranck,et al.  Spatial firing patterns of hippocampal complex-spike cells in a fixed environment , 1987, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[16]  Donald M. Wilkie,et al.  A computer simulation model of rats’ place navigation in the Morris water maze , 1987 .

[17]  C. Watkins Learning from delayed rewards , 1989 .

[18]  A. Barto,et al.  Learning and Sequential Decision Making , 1989 .

[19]  J. O’Keefe,et al.  Hippocampal Complex Spike Cells do not Change Their Place Fields if the Goal is Moved Within a Cue Controlled Environment , 1990, The European journal of neuroscience.

[20]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[21]  Peter Dayan,et al.  Navigating Through Temporal Difference , 1990, NIPS.

[22]  M. Gabriel,et al.  Learning and Computational Neuroscience: Foundations of Adaptive Networks , 1990 .

[23]  C. Gallistel The organization of learning , 1990 .

[24]  I. Whishaw Latent learning in a Swimming Pool Place Task by Rats: Evidence for the use of Associative and not Cognitive Mapping Processes , 1991, The Quarterly journal of experimental psychology. B, Comparative and physiological psychology.

[25]  永福 智志 The Organization of Learning , 2005, Journal of Cognitive Neuroscience.

[26]  Satinder Singh Transfer of learning by composing solutions of elemental sequential tasks , 2004, Machine Learning.

[27]  Satinder P. Singh,et al.  Reinforcement Learning with a Hierarchy of Abstract Models , 1992, AAAI.

[28]  J. O’Keefe,et al.  Phase relationship between hippocampal place units and the EEG theta rhythm , 1993, Hippocampus.

[29]  B L McNaughton,et al.  Dynamics of the hippocampal ensemble code for space. , 1993, Science.

[30]  Peter Dayan,et al.  Improving Generalization for Temporal Difference Learning: The Successor Representation , 1993, Neural Computation.

[31]  A. Berthoz,et al.  Neurons responding to whole-body motion in the primate hippocampus , 1994, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[32]  Michael Recce,et al.  A model of hippocampal function , 1994, Neural Networks.

[33]  R. Morris,et al.  Distinct components of spatial learning revealed by prior training and NMDA receptor blockade , 1995, Nature.

[34]  R. Sutherland,et al.  Configural association theory and the hippocampal formation: An appraisal and reconfiguration , 1995, Hippocampus.

[35]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[36]  D. Cain,et al.  Spatial learning without NMDA receptor-dependent long-term potentiation , 1995, Nature.

[37]  P. E. Sharp,et al.  Simulation of spatial learning in the Morris water maze by a neural network model of the hippocampal formation and nucleus accumbens , 1995, Hippocampus.

[38]  J. Taube Head direction cells recorded in the anterior thalamic nuclei of freely moving rats , 1995, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[39]  P. Dayan,et al.  A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[40]  L. F. Abbott,et al.  A Model of Spatial Map Formation in the Hippocampus of the Rat , 1999, Neural Computation.

[41]  H. Eichenbaum Is the rodent hippocampus just for ‘place’? , 1996, Current Opinion in Neurobiology.

[42]  J. O’Keefe,et al.  Geometric determinants of the place fields of hippocampal neurons , 1996, Nature.

[43]  B. McNaughton,et al.  Theta phase precession in hippocampal neuronal populations and the compression of temporal sequences , 1996, Hippocampus.

[44]  H. Eichenbaum,et al.  Conservation of hippocampal memory function in rats and humans , 1996, Nature.

[45]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[46]  S. Wiener Spatial, behavioral and sensory correlates of hippocampal CA1 complex spike cell activity: Implications for information processing functions , 1996, Progress in Neurobiology.

[47]  W B Levy,et al.  A sequence predicting CA3 is a flexible associator that learns and uses context to solve hippocampal‐like tasks , 1996, Hippocampus.

[48]  K. I. Blum,et al.  Functional significance of long-term potentiation for sequence learning and prediction. , 1996, Cerebral cortex.

[49]  James L. McClelland,et al.  Considerations arising from a complementary learning systems perspective on hippocampus and neocortex , 1996, Hippocampus.

[50]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[51]  Katsushi Ikeuchi,et al.  Symbolic visual learning , 1997 .

[52]  David S. Touretzky,et al.  Navigating with landmarks: computing goal locations from places codes , 1997 .

[53]  Jean-Arcady Meyer,et al.  BIOLOGICALLY BASED ARTIFICIAL NAVIGATION SYSTEMS: REVIEW AND PROSPECTS , 1997, Progress in Neurobiology.

[54]  C Kentros,et al.  Abolition of long-term stability of new hippocampal place cell maps by NMDA receptor blockade. , 1998, Science.

[55]  David Wood,et al.  Luddites must not block progress in genetics , 1999, Nature.

[56]  David S. Touretzky,et al.  Towards a Computational Theory of Rat Navigation , 1999 .

[57]  H. Eichenbaum,et al.  The global record of memory in hippocampal neuronal activity , 1999, Nature.

[58]  R. Morris,et al.  Delay‐dependent impairment of a matching‐to‐place task with chronic and intrahippocampal infusion of the NMDA‐antagonist D‐AP5 , 1999, Hippocampus.