Experience replay supports non-local learning

To make effective decisions we need to consider the relationship between actions and outcomes. They are, however, often separated by time and space. The biological mechanism capable of spanning those gaps remains unknown. One promising, albeit hypothetical, mechanism involves neural replay of non-local experience. Using a novel task, that segregates direct from indirect learning, combined with magnetoencephalography (MEG), we tested the role of neural replay in non-local learning in humans. Following reward receipt, we found significant backward replay of non-local experience, with a 160 msec state-to-state time lag, and this replay facilitated learning of action values. This backward replay, combined with behavioural evidence of non-local learning, was more pronounced in experiences that were of greater benefit for future behavior, as predicted by theories of prioritization. These findings establish rationally targeted non-local replay as a neural mechanism for solving complex credit assignment problems during learning. One Sentence Summary Reverse sequential replay is found, for the first time, to support non-local reinforcement learning in humans and is prioritized according to utility.

[1]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[2]  Timothy E. J. Behrens,et al.  Human Replay Spontaneously Reorganizes Experience , 2019, Cell.

[3]  D. Shohamy,et al.  Preference by Association: How Memory Mechanisms in the Hippocampus Bias Decisions , 2012, Science.

[4]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[5]  Mattias P. Karlsson,et al.  Constant Sub-second Cycling between Representations of Possible Futures in the Hippocampus , 2019, Cell.

[6]  Nicolas W. Schuck,et al.  Sequential replay of nonspatial task states in the human hippocampus , 2018, Science.

[7]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[8]  P. Dayan,et al.  Temporal structure in associative retrieval , 2015, eLife.

[9]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[10]  Brad E. Pfeiffer,et al.  Alternating sequences of future and past behavior encoded within hippocampal theta oscillations , 2020, Science.

[11]  Brad E. Pfeiffer,et al.  Reverse Replay of Hippocampal Place Cells Is Uniquely Modulated by Changing Reward , 2016, Neuron.

[12]  Richard S. Sutton,et al.  Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.

[13]  Margaret F. Carr,et al.  Hippocampal replay in the awake state: a potential substrate for memory consolidation and retrieval , 2011, Nature Neuroscience.

[14]  Zeb Kurth-Nelson,et al.  Measuring Sequences of Representations with Temporally Delayed Linear Modelling , 2020, bioRxiv.

[15]  G. Buzsáki,et al.  Forward and reverse hippocampal place-cell sequences during ripples , 2007, Nature Neuroscience.

[16]  Ida Momennejad,et al.  Offline replay supports planning in human reinforcement learning , 2018, eLife.

[17]  Dylan A. Simon,et al.  Model-based choices involve prospective neural activity , 2015, Nature Neuroscience.

[18]  P. Dayan,et al.  Model-based influences on humans’ choices and striatal prediction errors , 2011, Neuron.

[19]  L. Frank,et al.  Awake Hippocampal Sharp-Wave Ripples Support Spatial Memory , 2012, Science.

[20]  Kenji Doya,et al.  Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.

[21]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[22]  David J. Foster,et al.  Reverse replay of behavioural sequences in hippocampal place cells during the awake state , 2006, Nature.

[23]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[24]  Zeb Kurth-Nelson,et al.  Fast Sequences of Non-spatial State Representations in Humans , 2016, Neuron.

[25]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[26]  Marcelo G Mattar,et al.  Prioritized memory access explains planning and hippocampal replay , 2017, Nature Neuroscience.

[27]  Karl J. Friston,et al.  Dissociable Roles of Ventral and Dorsal Striatum in Instrumental Conditioning , 2004, Science.

[28]  Samuel Gershman,et al.  Predictive representations can link model-based reinforcement learning to model-free mechanisms , 2017, bioRxiv.

[29]  B. McNaughton,et al.  Reactivation of hippocampal ensemble memories during sleep. , 1994, Science.

[30]  Andrew W. Moore,et al.  Prioritized sweeping: Reinforcement learning with less data and less time , 2004, Machine Learning.

[31]  N. Daw,et al.  Hippocampal Contributions to Model-Based Planning and Spatial Memory , 2019, Neuron.

[32]  P. Lewis,et al.  Overlapping memory replay during sleep builds cognitive schemata , 2011, Trends in Cognitive Sciences.

[33]  M. R. Mehta,et al.  Role of experience and oscillations in transforming a rate code into a temporal code , 2002, Nature.

[34]  B. McNaughton,et al.  Replay of Neuronal Firing Sequences in Rat Hippocampus During Sleep Following Spatial Experience , 1996, Science.

[35]  Brad E. Pfeiffer,et al.  Hippocampal place cell sequences depict future paths to remembered goals , 2013, Nature.

[36]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[37]  Timothy E. J. Behrens,et al.  Episodic memory retrieval is supported by rapid replay of episode content , 2019, bioRxiv.

[38]  Kenji Doya,et al.  What are the computations of the cerebellum, the basal ganglia and the cerebral cortex? , 1999, Neural Networks.

[39]  G. Buzsáki Theta Oscillations in the Hippocampus , 2002, Neuron.