The roles of online and offline replay in planning

Animals and humans replay neural patterns encoding trajectories through their environment, both whilst they solve decision-making tasks and during rest. Both on-task and off-task replay are believed to contribute to flexible decision making, though how their relative contributions differ remains unclear. We investigated this question by using magnetoencephalography to study human subjects while they performed a decision-making task that was designed to reveal the decision algorithms employed. We characterized subjects in terms of how flexibly each adjusted their choices to changes in temporal, spatial and reward structure. The more flexible a subject, the more they replayed trajectories during task performance, and this replay was coupled with re-planning of the encoded trajectories. The less flexible a subject, the more they replayed previously and subsequently preferred trajectories during rest periods between task epochs. The data suggest that online and offline replay both participate in planning but support distinct decision strategies.

[1]  R. Jackendoff What is a cognitive map? , 1979, Behavioral and Brain Sciences.

[2]  Richard S. Sutton,et al.  Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.

[3]  Jing Peng,et al.  Efficient Learning and Planning Within the Dyna Framework , 1993, Adapt. Behav..

[4]  B. McNaughton,et al.  Replay of Neuronal Firing Sequences in Rat Hippocampus During Sleep Following Spatial Experience , 1996, Science.

[5]  K. Stanovich,et al.  Heuristics and Biases: Individual Differences in Reasoning: Implications for the Rationality Debate? , 2002 .

[6]  K. Stanovich,et al.  Individual differences in reasoning: Implications for the rationality debate? , 2000, Behavioral and Brain Sciences.

[7]  M. Wilson,et al.  Temporally Structured Replay of Awake Hippocampal Ensemble Activity during Rapid Eye Movement Sleep , 2001, Neuron.

[8]  Karl J. Friston,et al.  Temporal Difference Models and Reward-Related Learning in the Human Brain , 2003, Neuron.

[9]  Andrew W. Moore,et al.  Prioritized sweeping: Reinforcement learning with less data and less time , 2004, Machine Learning.

[10]  Andrew W. Moore,et al.  Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.

[11]  T. Robbins,et al.  Neural systems of reinforcement for drug addiction: from actions to habits to compulsion , 2005, Nature Neuroscience.

[12]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[13]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[14]  David J. Foster,et al.  Reverse replay of behavioural sequences in hippocampal place cells during the awake state , 2006, Nature.

[15]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[16]  G. Buzsáki,et al.  Forward and reverse hippocampal place-cell sequences during ripples , 2007, Nature Neuroscience.

[17]  M. Wilson,et al.  Coordinated memory replay in the visual cortex and hippocampus during sleep , 2007, Nature Neuroscience.

[18]  P. Dayan,et al.  States versus Rewards: Dissociable Neural Prediction Error Signals Underlying Model-Based and Model-Free Reinforcement Learning , 2010, Neuron.

[19]  Matthijs A. A. van der Meer,et al.  Hippocampal Replay Is Not a Simple Function of Experience , 2010, Neuron.

[20]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[21]  Robert Oostenveld,et al.  FieldTrip: Open Source Software for Advanced Analysis of MEG, EEG, and Invasive Electrophysiological Data , 2010, Comput. Intell. Neurosci..

[22]  P. Dayan,et al.  Model-based influences on humans’ choices and striatal prediction errors , 2011, Neuron.

[23]  Peter Dayan,et al.  Bonsai Trees in Your Head: How the Pavlovian System Sculpts Goal-Directed Choices by Pruning Decision Trees , 2012, PLoS Comput. Biol..

[24]  M. Woolrich,et al.  Mechanisms underlying cortical activity during value-guided choice , 2011, Nature Neuroscience.

[25]  Brad E. Pfeiffer,et al.  Hippocampal place cell sequences depict future paths to remembered goals , 2013, Nature.

[26]  M. Crockett Models of morality , 2013, Trends in Cognitive Sciences.

[27]  Tandra Ghose,et al.  Generalization between canonical and non-canonical views in object recognition. , 2013, Journal of vision.

[28]  David A. Tovar,et al.  Representational dynamics of object vision: the first 1000 ms. , 2013, Journal of vision.

[29]  John K. Kruschke,et al.  Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan , 2014 .

[30]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[31]  Radoslaw Martin Cichy,et al.  Resolving human object recognition in space and time , 2014, Nature Neuroscience.

[32]  N. McGlynn Thinking fast and slow. , 2014, Australian veterinary journal.

[33]  Matthijs A. A. van der Meer,et al.  Internally generated sequences in learning and executing goal-directed behavior , 2014, Trends in Cognitive Sciences.

[34]  A. Markman,et al.  Journal of Experimental Psychology : General Retrospective Revaluation in Sequential Decision Making : A Tale of Two Systems , 2012 .

[35]  Joel Z. Leibo,et al.  The dynamics of invariant object recognition in the human visual system. , 2014, Journal of neurophysiology.

[36]  D. Hassabis,et al.  Hippocampal place cells construct reward related sequences through unexplored space , 2015, eLife.

[37]  N. Daw,et al.  Model-based learning protects against forming habits , 2015, Cognitive, Affective, & Behavioral Neuroscience.

[38]  P. Dayan,et al.  Temporal structure in associative retrieval , 2015, eLife.

[39]  Zeb Kurth-Nelson,et al.  Fast Sequences of Non-spatial State Representations in Humans , 2016, Neuron.

[40]  Wouter Kool,et al.  When Does Model-Based Control Pay Off? , 2016, PLoS Comput. Biol..

[41]  Catherine A. Hartley,et al.  From Creatures of Habit to Goal-Directed Learners , 2016, Psychological science.

[42]  P. Dayan,et al.  Striatal structure and function predict individual biases in learning to avoid pain , 2016, Proceedings of the National Academy of Sciences.

[43]  Kimberly L. Stachenfeld,et al.  The hippocampus as a predictive map , 2017, Nature Neuroscience.

[44]  P. Dayan,et al.  Single-Trial Inhibition of Anterior Cingulate Disrupts Model-based Reinforcement Learning in a Two-step Decision Task. , 2017 .

[45]  Samuel Gershman,et al.  Predictive representations can link model-based reinforcement learning to model-free mechanisms , 2017, bioRxiv.

[46]  C. Barry,et al.  Task Demands Predict a Dynamic Switch in the Content of Awake Hippocampal Replay , 2017, Neuron.

[47]  Jiqiang Guo,et al.  Stan: A Probabilistic Programming Language. , 2017, Journal of statistical software.

[48]  T. Robbins,et al.  A trans-diagnostic perspective on obsessive-compulsive disorder , 2017, Psychological Medicine.

[49]  David J. Foster Replay Comes of Age. , 2017, Annual review of neuroscience.

[50]  M. Botvinick,et al.  The hippocampus as a predictive map , 2016 .

[51]  Marcelo G Mattar,et al.  Prioritized memory access explains planning and hippocampal replay , 2017, Nature Neuroscience.

[52]  Ida Momennejad,et al.  Offline replay supports planning in human reinforcement learning , 2018, eLife.

[53]  Alyssa A. Carey,et al.  Reward revaluation biases hippocampal sequence content away from the preferred outcome , 2018, bioRxiv.

[54]  Zeb Kurth-Nelson,et al.  What Is a Cognitive Map? Organizing Knowledge for Flexible Behavior , 2018, Neuron.

[55]  Zeb Kurth-Nelson,et al.  Magnetoencephalography decoding reveals structural differences within integrative decision processes , 2018, Nature Human Behaviour.

[56]  Benedek Kurdi,et al.  Model-free and model-based learning processes in the updating of explicit and implicit evaluations , 2019, Proceedings of the National Academy of Sciences.

[57]  Timothy E. J. Behrens,et al.  Human Replay Spontaneously Reorganizes Experience , 2019, Cell.

[58]  Todd A. Hare,et al.  Model-free or muddled models in the two-stage task? , 2019 .

[59]  Nicolas W. Schuck,et al.  Sequential replay of nonspatial task states in the human hippocampus , 2018, Science.

[60]  P. Dayan,et al.  Anterior cingulate cortex represents action-state predictions and causally mediates model-based reinforcement learning in a two-step decision task , 2020, bioRxiv.