Behavioural and computational evidence for memory consolidation biased by reward-prediction errors

Neural activity encoding recent experiences is replayed during sleep and rest to promote consolidation of the corresponding memories. However, precisely which features of experience influence replay prioritisation to optimise adaptive behaviour remains unclear. Here, we trained adult male rats on a novel maze-based rein-forcement learning task designed to dissociate reward outcomes from reward-prediction errors. Four variations of a reinforcement learning model were fitted to the rats’ behaviour over multiple days. Behaviour was best predicted by a model incorporating replay biased by reward-prediction error, compared to the same model with no replay; random replay or reward-biased replay produced poorer predictions of behaviour. This insight disentangles the influences of salience on replay, suggesting that reinforcement learning is tuned by post-learning replay biased by reward-prediction error, not by reward per se. This work therefore provides a behavioural and theoretical toolkit with which to measure and interpret replay in striatal, hippocampal and neocortical circuits.

[1]  Fabian Kloosterman,et al.  Post-learning Hippocampal Replay Selectively Reinforces Spatial Memory for Highly Rewarded Locations , 2019, Current Biology.

[2]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[3]  Rebecca L. Jackson,et al.  Sleep spindles provide indirect support to the consolidation of emotional encoding contexts , 2014, Neuropsychologia.

[4]  Ida Momennejad,et al.  Offline replay supports planning in human reinforcement learning , 2018, eLife.

[5]  L. Frank,et al.  Rewarded Outcomes Enhance Reactivation of Experience in the Hippocampus , 2009, Neuron.

[6]  M. Wilson,et al.  Disruption of ripple‐associated hippocampal activity during rest impairs spatial learning in the rat , 2009, Hippocampus.

[7]  Daniel Bendor,et al.  Biasing the content of hippocampal replay during sleep , 2012, Nature Neuroscience.

[8]  U. Frey,et al.  Synaptic tagging: implications for late maintenance of hippocampal long-term potentiation , 1998, Trends in Neurosciences.

[9]  Brad E. Pfeiffer,et al.  Reverse Replay of Hippocampal Place Cells Is Uniquely Modulated by Changing Reward , 2016, Neuron.

[10]  J. Csicsvari,et al.  Firing rates of hippocampal neurons are preserved during subsequent sleep episodes and modified by novel awake experience , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Emma K. Bridger,et al.  Sleep spindles during a nap correlate with post sleep memory performance for highly rewarded word-pairs , 2017, Brain and Language.

[12]  C. Barry,et al.  The Role of Hippocampal Replay in Memory and Planning , 2018, Current Biology.

[13]  M. Wilson,et al.  VTA neurons coordinate with the hippocampal reactivation of spatial experience , 2015, eLife.

[14]  Krzysztof J. Gorgolewski,et al.  Reward Learning over Weeks Versus Minutes Increases the Neural Representation of Value in the Human Brain , 2018, The Journal of Neuroscience.

[15]  Brad E. Pfeiffer,et al.  Hippocampal place cell sequences depict future paths to remembered goals , 2013, Nature.

[16]  Elizabeth A. McDevitt,et al.  Human hippocampal replay during rest prioritizes weakly learned information and predicts memory performance , 2017, Nature Communications.

[17]  M. Roesch,et al.  Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards , 2007, Nature Neuroscience.

[18]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[19]  Karl J. Friston,et al.  Temporal Difference Models and Reward-Related Learning in the Human Brain , 2003, Neuron.

[20]  P. Montague,et al.  Activity in human ventral striatum locked to errors of reward prediction , 2002, Nature Neuroscience.

[21]  Laura A. Atherton,et al.  Memory trace replay: the shaping of memory consolidation by neuromodulation , 2015, Trends in Neurosciences.

[22]  A. Cooper,et al.  Predictive Reward Signal of Dopamine Neurons , 2011 .

[23]  R. Schmidt,et al.  Striatal action-learning based on dopamine concentration , 2009, Experimental Brain Research.

[24]  D. Dupret,et al.  Dopaminergic neurons promote hippocampal reactivation and spatial memory persistence , 2014, Nature Neuroscience.

[25]  B. McNaughton,et al.  Hippocampus Leads Ventral Striatum in Replay of Place-Reward Information , 2009, PLoS biology.

[26]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[27]  Thommen George Karimpanal,et al.  Experience Replay Using Transition Sequences , 2017, Front. Neurorobot..

[28]  Caswell Barry,et al.  Coordinated grid and place cell replay during rest , 2016, Nature Neuroscience.

[29]  Margaret F. Carr,et al.  Hippocampal replay in the awake state: a potential substrate for memory consolidation and retrieval , 2011, Nature Neuroscience.

[30]  V. Sterpenich,et al.  A nap to recap or how reward regulates hippocampal-prefrontal memory networks during daytime sleep in humans , 2015, eLife.

[31]  Paolo Calabresi,et al.  Dopamine-mediated regulation of corticostriatal synaptic plasticity , 2007, Trends in Neurosciences.

[32]  R. Morris,et al.  Making memories last: the synaptic tagging and capture hypothesis , 2010, Nature Reviews Neuroscience.

[33]  Daeyeol Lee,et al.  Signals for Previous Goal Choice Persist in the Dorsomedial, but Not Dorsolateral Striatum of Rats , 2013, The Journal of Neuroscience.

[34]  György Buzsáki,et al.  Reactivations of emotional memory in the hippocampus–amygdala system during sleep , 2017, Nature Neuroscience.

[35]  L. Frank,et al.  Awake Hippocampal Sharp-Wave Ripples Support Spatial Memory , 2012, Science.

[36]  Joel L. Voss,et al.  Strengthening Individual Memories by Reactivating Them During Sleep , 2009, Science.

[37]  M. Dresler,et al.  The role of rapid eye movement sleep for amygdala-related memory processing , 2015, Neurobiology of Learning and Memory.

[38]  Matthijs A. A. van der Meer,et al.  Hippocampal Replay Is Not a Simple Function of Experience , 2010, Neuron.

[39]  J. Born,et al.  The contribution of sleep to hippocampus-dependent memory consolidation , 2007, Trends in Cognitive Sciences.

[40]  G. Buzsáki,et al.  Selective suppression of hippocampal ripples impairs spatial memory , 2009, Nature Neuroscience.

[41]  Mattias P. Karlsson,et al.  Distinct hippocampal-cortical memory representations for experiences associated with movement versus immobility , 2017, eLife.

[42]  A. David Redish,et al.  Hippocampal replay contributes to within session learning in a temporal difference reinforcement learning model , 2005, Neural Networks.

[43]  B. McNaughton,et al.  Reactivation of Hippocampal Cell Assemblies: Effects of Behavioral State, Experience, and EEG Dynamics , 1999, The Journal of Neuroscience.

[44]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[45]  W. Schultz Dopamine reward prediction error coding , 2016, Dialogues in clinical neuroscience.

[46]  Eric W. Gobel,et al.  Cued Memory Reactivation During Sleep Influences Skill Learning , 2012, Nature Neuroscience.

[47]  P. Dayan,et al.  Reinforcement learning: The Good, The Bad and The Ugly , 2008, Current Opinion in Neurobiology.

[48]  P. Dayan,et al.  A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[49]  L. Frank,et al.  New Experiences Enhance Coordinated Neural Activity in the Hippocampus , 2008, Neuron.

[50]  R. Stickgold Sleep-dependent memory consolidation , 2005, Nature.

[51]  R. Wightman,et al.  Associative learning mediates dynamic shifts in dopamine signaling in the nucleus accumbens , 2007, Nature Neuroscience.

[52]  J. O’Neill,et al.  The reorganization and reactivation of hippocampal maps predict spatial memory performance , 2010, Nature Neuroscience.

[53]  Samuel M. McClure,et al.  BOLD Responses Reflecting Dopaminergic Signals in the Human Ventral Tegmental Area , 2008, Science.

[54]  Matthias J. Gruber,et al.  Post-learning Hippocampal Dynamics Promote Preferential Retention of Rewarding Events , 2016, Neuron.

[55]  Günther Knoblich,et al.  How Memory Replay in Sleep Boosts Creative Problem-Solving , 2018, Trends in Cognitive Sciences.

[56]  J. Born,et al.  Odor Cues During Slow-Wave Sleep Prompt Declarative Memory Consolidation , 2007, Science.

[57]  Hannah M. Batchelor,et al.  Dopamine Neurons Respond to Errors in the Prediction of Sensory Features of Expected Rewards , 2017, Neuron.

[58]  Geoffrey Schoenbaum,et al.  Dopamine transients delivered in learning contexts do not act as model-free prediction errors , 2019, bioRxiv.

[59]  B. McNaughton,et al.  Offline reactivation of experience-dependent neuronal firing patterns in the rat ventral tegmental area. , 2015, Journal of neurophysiology.

[60]  A. H. Murphy A New Vector Partition of the Probability Score , 1973 .

[61]  Kamran Diba,et al.  Hippocampal Reactivation Extends for Several Hours Following Novel Experience , 2018, The Journal of Neuroscience.

[62]  P. Janak,et al.  Ventral Tegmental Dopamine Neurons Participate in Reward Identity Predictions , 2019, Current Biology.

[63]  David J. Foster,et al.  Reverse replay of behavioural sequences in hippocampal place cells during the awake state , 2006, Nature.

[64]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[65]  D. Ji,et al.  Hippocampal awake replay in fear memory retrieval , 2017, Nature Neuroscience.

[66]  K. Doya,et al.  Validation of Decision-Making Models and Analysis of Decision Variables in the Rat Basal Ganglia , 2009, The Journal of Neuroscience.

[67]  Samuel M. McClure,et al.  Temporal Prediction Errors in a Passive Learning Task Activate Human Striatum , 2003, Neuron.

[68]  C. Barry,et al.  Task Demands Predict a Dynamic Switch in the Content of Awake Hippocampal Replay , 2017, Neuron.

[69]  P. Lewis,et al.  Overlapping memory replay during sleep builds cognitive schemata , 2011, Trends in Cognitive Sciences.

[70]  J. Born,et al.  The memory function of sleep , 2010, Nature Reviews Neuroscience.

[71]  Pawel Cichosz,et al.  An Analysis of Experience Replay in Temporal Difference Learning , 1999, Cybern. Syst..

[72]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.