Reinforcement Learning and Hippocampal Dynamics

Recent experimental findings on hippocampal representational dynamics such as route replay and sweeps match intuitive notions from reinforcement learning including transiently representing potential trajectories and reward locations. We explore these intuitions within a formal reinforcement learning framework and examine how these representational dynamics might be integrated with reinforcement learning algorithms. We suggest that hippocampal representational dynamics can be best integrated within a model-based reinforcement learning framework and show how this framework can be used to cultivate specific quantitative predictions for the control processes that direct and utilize hippocampal representations.

[1]  J. O’Neill,et al.  Place-Selective Firing of CA1 Pyramidal Cells during Sharp Wave/Ripple Network Patterns in Exploratory Behavior , 2006, Neuron.

[2]  W. F. Prokasy,et al.  Classical conditioning II: Current research and theory. , 1972 .

[3]  B. McNaughton,et al.  Reactivation of Hippocampal Cell Assemblies: Effects of Behavioral State, Experience, and EEG Dynamics , 1999, The Journal of Neuroscience.

[4]  J. Csicsvari,et al.  Firing rates of hippocampal neurons are preserved during subsequent sleep episodes and modified by novel awake experience , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Anoopum S. Gupta,et al.  Segmentation of spatial experience by hippocampal theta sequences , 2012, Nature Neuroscience.

[6]  Matthew A. Wilson,et al.  Hippocampal Replay of Extended Experience , 2009, Neuron.

[7]  A G Barto,et al.  Toward a modern theory of adaptive networks: expectation and prediction. , 1981, Psychological review.

[8]  B. McNaughton,et al.  Reactivation of hippocampal ensemble memories during sleep. , 1994, Science.

[9]  P. Grant,et al.  Dopaminergic foundations of schizotypy as measured by the German version of the Oxford-Liverpool Inventory of Feelings and Experiences (O-LIFE)—a suitable endophenotype of schizophrenia , 2013, Front. Hum. Neurosci..

[10]  A. Redish Beyond the Cognitive Map: From Place Cells to Episodic Memory , 1999 .

[11]  J. Rawlins,et al.  Dissociating context and space within the hippocampus: effects of complete, dorsal, and ventral excitotoxic hippocampal lesions on conditioned freezing and spatial learning. , 1999, Behavioral neuroscience.

[12]  E. Tolman The determiners of behavior at a choice point. , 1938 .

[13]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[14]  L. Frank,et al.  New Experiences Enhance Coordinated Neural Activity in the Hippocampus , 2008, Neuron.

[15]  David J. Foster,et al.  Reverse replay of behavioural sequences in hippocampal place cells during the awake state , 2006, Nature.

[16]  Dorothy Tse,et al.  References and Notes Supporting Online Material Materials and Methods Figs. S1 to S5 Tables S1 to S3 Electron Impact (ei) Mass Spectra Chemical Ionization (ci) Mass Spectra References Schemas and Memory Consolidation Research Articles Research Articles Research Articles Research Articles , 2022 .

[17]  D. Olton,et al.  Fimbria-fornix lesions impair spatial working memory but not cognitive mapping. , 1984, Behavioral neuroscience.

[18]  E. Tolman Prediction of vicarious trial and error by means of the schematic sowbug. , 1939 .

[19]  G. Buzsáki,et al.  Forward and reverse hippocampal place-cell sequences during ripples , 2007, Nature Neuroscience.

[20]  L. Nadel,et al.  The Hippocampus as a Cognitive Map , 1978 .

[21]  Eric A. Zilli,et al.  Modeling the role of working memory and episodic memory in behavioral tasks , 2008, Hippocampus.

[22]  R. Rescorla,et al.  A theory of Pavlovian conditioning : Variations in the effectiveness of reinforcement and nonreinforcement , 1972 .

[23]  Paul Schrater,et al.  The hippocampus and exploration: dynamically evolving behavior and neural representations , 2012, Front. Hum. Neurosci..

[24]  R. Bellman Dynamic programming. , 1957, Science.

[25]  B. McNaughton,et al.  Replay of Neuronal Firing Sequences in Rat Hippocampus During Sleep Following Spatial Experience , 1996, Science.

[26]  R. Buckner,et al.  Self-projection and the brain , 2007, Trends in Cognitive Sciences.

[27]  B. McNaughton,et al.  Comparison of spatial firing characteristics of units in dorsal and ventral hippocampus of the rat , 1994, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[28]  Matthijs A. A. van der Meer,et al.  Hippocampal Replay Is Not a Simple Function of Experience , 2010, Neuron.

[29]  D. Schacter,et al.  Remembering the past to imagine the future: the prospective brain , 2007, Nature Reviews Neuroscience.

[30]  Adam Johnson,et al.  Neural Ensembles in CA3 Transiently Encode Paths Forward of the Animal at a Decision Point , 2007, The Journal of Neuroscience.

[31]  James L. McClelland,et al.  Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. , 1995, Psychological review.

[32]  Brad E. Pfeiffer,et al.  Hippocampal place cell sequences depict future paths to remembered goals , 2013, Nature.

[33]  J. O’Neill,et al.  Place-selective firing contributes to the reverse-order reactivation of CA1 pyramidal cells during sharp waves in open-field exploration , 2007, The European journal of neuroscience.

[34]  G. Handelmann,et al.  Hippocampal function: Working memory or cognitive mapping? , 1980 .

[35]  Margaret F. Carr,et al.  Hippocampal SWR Activity Predicts Correct Decisions during the Initial Learning of an Alternation Task , 2013, Neuron.

[36]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[37]  C. Pavlides,et al.  Influences of hippocampal place cell firing in the awake state on the activity of these cells during subsequent sleep episodes , 1989, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[38]  Michael C. Corballis,et al.  Mental time travel: a case for evolutionary continuity , 2013, Trends in Cognitive Sciences.

[39]  David J. Foster,et al.  A model of hippocampally dependent navigation, using the temporal difference learning rule , 2000, Hippocampus.

[40]  Adam Johnson,et al.  Triple Dissociation of Information Processing in Dorsal Striatum, Ventral Striatum, and Hippocampus on a Learned Spatial Decision Task , 2010, Neuron.

[41]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[42]  Albert K. Lee,et al.  Memory of Sequential Experience in the Hippocampus during Slow Wave Sleep , 2002, Neuron.

[43]  A. David Redish,et al.  Hippocampal replay contributes to within session learning in a temporal difference reinforcement learning model , 2005, Neural Networks.

[44]  B. Balleine,et al.  Lesions of dorsolateral striatum preserve outcome expectancy but disrupt habit formation in instrumental learning , 2004, The European journal of neuroscience.

[45]  Mattias P. Karlsson,et al.  Awake replay of remote experiences in the hippocampus , 2009, Nature Neuroscience.

[46]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[47]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[48]  Andrew M. Wikenheiser,et al.  The balance of forward and backward hippocampal sequences shifts across behavioral states , 2013, Hippocampus.

[49]  J. O’Neill,et al.  Reactivation of experience-dependent cell assembly patterns in the hippocampus , 2008, Nature Neuroscience.

[50]  Bruce L McNaughton,et al.  Methodological Considerations on the Use of Template Matching to Study Long-Lasting Memory Trace Replay , 2006, The Journal of Neuroscience.

[51]  J. Csicsvari,et al.  Replay and Time Compression of Recurring Spike Sequences in the Hippocampus , 1999, The Journal of Neuroscience.

[52]  L. Frank,et al.  Rewarded Outcomes Enhance Reactivation of Experience in the Hippocampus , 2009, Neuron.

[53]  Raymond J. Dolan,et al.  The anatomy of choice: active inference and agency , 2013, Front. Hum. Neurosci..

[54]  P. Dayan,et al.  Off-line replay maintains declarative memories in a model of hippocampal-neocortical interactions , 2004, Nature Neuroscience.

[55]  Daniel A. Braun,et al.  A sensorimotor paradigm for Bayesian model selection , 2012, Front. Hum. Neurosci..

[56]  P. Dayan,et al.  A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[57]  Jadin C. Jackson,et al.  Hippocampal Sharp Waves and Reactivation during Awake States Depend on Repeated Sequential Experience , 2006, The Journal of Neuroscience.

[58]  K. F. Muenzinger Vicarious Trial and Error at a Point of Choice: I. A General Survey of its Relation to Learning Efficiency , 1938 .