The Successor Representation and Temporal Context

The successor representation was introduced into reinforcement learning by Dayan (1993) as a means of facilitating generalization between states with similar successors. Although reinforcement learning in general has been used extensively as a model of psychological and neural processes, the psychological validity of the successor representation has yet to be explored. An interesting possibility is that the successor representation can be used not only for reinforcement learning but for episodic learning as well. Our main contribution is to show that a variant of the temporal context model (TCM; Howard & Kahana, 2002), an influential model of episodic memory, can be understood as directly estimating the successor representation using the temporal difference learning algorithm (Sutton & Barto, 1998). This insight leads to a generalization of TCM and new experimental predictions. In addition to casting a new normative light on TCM, this equivalence suggests a previously unexplored point of contact between different learning systems.

[1]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[2]  M J Kahana,et al.  Adult age differences in the temporal characteristics of category free recall. , 1998, Psychology and aging.

[3]  Alana T. Wong,et al.  Remembering the past and imagining the future: Common and distinct neural substrates during event construction and elaboration , 2007, Neuropsychologia.

[4]  J. Deese,et al.  Serial effects in recall of unorganized and sequentially organized verbal material. , 1957, Journal of experimental psychology.

[5]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[6]  Scott D. Brown,et al.  The simplest complete model of choice response time: Linear ballistic accumulation , 2008, Cognitive Psychology.

[7]  D. Hassabis,et al.  Patients with hippocampal amnesia cannot imagine new experiences , 2007, Proceedings of the National Academy of Sciences.

[8]  J. Lisman,et al.  Storage, recall, and novelty detection of sequences by the hippocampus: Elaborating on the SOCRATIC model to account for normal and aberrant effects of dopamine , 2001, Hippocampus.

[9]  J. Lisman,et al.  Hippocampus as comparator: Role of the two input and two output systems of the hippocampus in selection and registration of information , 2001, Hippocampus.

[10]  D. Schacter,et al.  Remembering the past to imagine the future: the prospective brain , 2007, Nature Reviews Neuroscience.

[11]  Per B Sederberg,et al.  The temporal contiguity effect predicts episodic memory performance. , 2010, Memory & cognition.

[12]  Marc W Howard,et al.  Spacing and lag effects in free recall of pure lists , 2005, Psychonomic bulletin & review.

[13]  Eric A. Zilli,et al.  Modeling the role of working memory and episodic memory in behavioral tasks , 2008, Hippocampus.

[14]  Gordon D. A. Brown,et al.  A temporal ratio model of memory. , 2007, Psychological review.

[15]  Ed Vul,et al.  Predicting the Optimal Spacing of Study: A Multiscale Context Model of Memory , 2009, NIPS.

[16]  Roland S. G. Jones Entorhinal-hippocampal connections: a speculative view of their function , 1993, Trends in Neurosciences.

[17]  Adler J. Perotte,et al.  A Bayesian Analysis of Dynamics in Free Recall , 2009, NIPS.

[18]  Joseph T. McGuire,et al.  A Neural Signature of Hierarchical Reinforcement Learning , 2011, Neuron.

[19]  P. Dayan,et al.  States versus Rewards: Dissociable Neural Prediction Error Signals Underlying Model-Based and Model-Free Reinforcement Learning , 2010, Neuron.

[20]  K. Szpunar,et al.  Neural substrates of envisioning the future , 2007, Proceedings of the National Academy of Sciences.

[21]  Y. Miyashita,et al.  Neural organization for the long-term memory of paired associates , 1991, Nature.

[22]  J. Byrne Learning and memory : a comprehensive reference , 2008 .

[23]  G. Altmann,et al.  The time-course of prediction in incremental sentence processing: Evidence from anticipatory eye-movements , 2003 .

[24]  R. Rescorla A theory of pavlovian conditioning: The effectiveness of reinforcement and non-reinforcement , 1972 .

[25]  Marc W Howard,et al.  The temporal context model in spatial navigation and relational learning: toward a common explanation of medial temporal lobe function across domains. , 2005, Psychological review.

[26]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[27]  Sridhar Mahadevan,et al.  Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes , 2007, J. Mach. Learn. Res..

[28]  Sean M. Polyn,et al.  A context maintenance and retrieval model of organizational processes in free recall. , 2009, Psychological review.

[29]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[30]  Marc W. Howard,et al.  A distributed representation of temporal context , 2002 .

[31]  Peter Dayan,et al.  Improving Generalization for Temporal Difference Learning: The Successor Representation , 1993, Neural Computation.

[32]  A. Glenberg,et al.  A temporal distinctiveness theory of recency and modality effects. , 1986, Journal of experimental psychology. Learning, memory, and cognition.

[33]  M. Botvinick,et al.  Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective , 2009, Cognition.

[34]  G. Altmann,et al.  Incremental interpretation at verbs: restricting the domain of subsequent reference , 1999, Cognition.

[35]  E. W. Kairiss,et al.  Hebbian synapses: biophysical mechanisms and algorithms. , 1990, Annual review of neuroscience.

[36]  John R. Anderson,et al.  RECOGNITION AND RETRIEVAL PROCESSES IN FREE RECALL , 1972 .

[37]  C. Atance,et al.  Episodic future thinking , 2001, Trends in Cognitive Sciences.

[38]  S. Becker,et al.  Remembering the past and imagining the future: a neural model of spatial memory and imagery. , 2007, Psychological review.

[39]  D. Kumaran,et al.  Which computational mechanisms operate in the hippocampus during novelty detection? , 2007, Hippocampus.

[40]  Marc W Howard,et al.  Contextual variability and serial position effects in free recall. , 1999, Journal of experimental psychology. Learning, memory, and cognition.

[41]  Kenji Doya,et al.  Metalearning and neuromodulation , 2002, Neural Networks.

[42]  Marc W. Howard,et al.  Associative Retrieval Processes in Episodic Memory , 2008 .

[43]  John G. Kemeny,et al.  Finite Markov chains , 1960 .

[44]  M. Kahana Associative retrieval processes in free recall , 1996, Memory & cognition.

[45]  Marc W. Howard,et al.  Sequential learning using temporal context , 2009 .

[46]  W. Estes Statistical theory of spontaneous recovery and regression. , 1955, Psychological review.

[47]  J. Raaijmakers,et al.  A model for interference and forgetting , 1988 .

[48]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[49]  James L. McClelland,et al.  The time course of perceptual choice: the leaky, competing accumulator model. , 2001, Psychological review.

[50]  William B. Levy,et al.  Interpreting hippocampal function as recoding and forecasting , 2005, Neural Networks.

[51]  Richard S. Sutton,et al.  TD Models: Modeling the World at a Mixture of Time Scales , 1995, ICML.

[52]  A D Redish,et al.  Prediction, sequences and the hippocampus , 2009, Philosophical Transactions of the Royal Society B: Biological Sciences.

[53]  E. Miller,et al.  Prospective Coding for Objects in Primate Prefrontal Cortex , 1999, The Journal of Neuroscience.

[54]  Nash Unsworth,et al.  Individual differences in working memory capacity and episodic retrieval: examining the dynamics of delayed and continuous distractor free recall. , 2007, Journal of experimental psychology. Learning, memory, and cognition.

[55]  Vinayak A. Rao,et al.  Bridging the gap: transitive associations between items presented in similar temporal contexts. , 2009, Journal of experimental psychology. Learning, memory, and cognition.

[56]  Marc W. Howard,et al.  Constructing Semantic Representations From a Gradually Changing Representation of Temporal Context , 2011, Top. Cogn. Sci..

[57]  Marc W Howard,et al.  A context-based theory of recency and contiguity in free recall. , 2008, Psychological review.

[58]  Demis Hassabis,et al.  The construction system of the brain , 2009, Philosophical Transactions of the Royal Society B: Biological Sciences.