Associative Learning from Replayed Experience

We develop an extension of the Rescorla-Wagner model of associative learning. In addition to learning from the current trial, the new model supposes that animals store and replay previous trials, learning from the replayed trials using the same learning rule. This simple idea provides a unified explanation for diverse phenomena that have proved challenging to earlier associative models, including spontaneous recovery, latent inhibition, retrospective revaluation, and trial spacing effects. For example, spontaneous recovery is explained by supposing that the animal replays its previous trials during the interval between extinction and test. These include earlier acquisition trials as well as recent extinction trials, and thus there is a gradual re-acquisition of the conditioned response. We present simulation results for the simplest version of this replay idea, where the trial memory is assumed empty at the beginning of an experiment, all experienced trials are stored and none removed, and sampling from the memory is performed at random. Even this minimal replay model is able to explain the challenging phenomena, illustrating the explanatory power of an associative model enhanced by learning from remembered as well as real experiences.

[1]  Edward Chace Tolman,et al.  "Insight" in rats , 1930 .

[2]  R. R. Bush,et al.  A mathematical model for simple learning. , 1951, Psychological review.

[3]  W. Estes Statistical theory of spontaneous recovery and regression. , 1955, Psychological review.

[4]  R. Lubow,et al.  Latent inhibition: the effect of nonreinforced pre-exposure to the conditional stimulus. , 1959, Journal of comparative and physiological psychology.

[5]  L. Kamin Predictability, surprise, attention, and conditioning , 1967 .

[6]  Richard C. Atkinson,et al.  Human Memory: A Proposed System and its Control Processes , 1968, Psychology of Learning and Motivation.

[7]  W. R. Salafia,et al.  The effects of ITI interpolated stimuli and CS intensity on classical conditioning of the nictitating membrane response of the rabbit , 1968 .

[8]  M. Rohrbaugh,et al.  Paradoxical enhancement of learned fear. , 1970, Journal of abnormal psychology.

[9]  M. Seligman Phobias and preparedness , 1971 .

[10]  R. Rescorla,et al.  A theory of Pavlovian conditioning : Variations in the effectiveness of reinforcement and nonreinforcement , 1972 .

[11]  R. Lubow Latent inhibition. , 1973, Psychological bulletin.

[12]  J. W. Rudy,et al.  Rehearsal in animal conditioning. , 1973, Journal of experimental psychology.

[13]  N. Mackintosh A Theory of Attention: Variations in the Associability of Stimuli with Reinforcement , 1975 .

[14]  W. R. Salafia,et al.  Disruption of rabbit (Oryctolagus cuniculus) nictitating membrane conditioning by posttrial electrical stimulation of hippocampus , 1977, Physiology & Behavior.

[15]  K. Haberlandt,et al.  Spontaneous recovery in rabbit eyelid conditioning. , 1978, The Journal of general psychology.

[16]  J. Pearce,et al.  A model for Pavlovian learning: variations in the effectiveness of conditioned but not of unconditioned stimuli. , 1980, Psychological review.

[17]  N. Mackintosh,et al.  Context specificity of conditioning, extinction, and latent inhibition. , 1984 .

[18]  A. Dickinson Actions and habits: the development of behavioural autonomy , 1985 .

[19]  R Ratcliff,et al.  Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. , 1990, Psychological review.

[20]  Richard S. Sutton,et al.  Time-Derivative Models of Pavlovian Reinforcement , 1990 .

[21]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[22]  G. Chapman Trial order affects cue interaction in contingency judgment. , 1991, Journal of experimental psychology. Learning, memory, and cognition.

[23]  I. Gormezano,et al.  Trace conditioning of the rabbit's nictitating membrane response as a function of CS-US interstimulus interval and trials per session , 1991 .

[24]  E. Kehoe,et al.  Rapid reacquisition in conditioning of the rabbit's nictitating membrane response. , 1992 .

[25]  Jing Peng,et al.  Efficient Learning and Planning Within the Dyna Framework , 1993, Adapt. Behav..

[26]  M. Bouton Context, time, and memory retrieval in the interference paradigms of Pavlovian learning. , 1993, Psychological bulletin.

[27]  H. Pashler Dual-task interference in simple tasks: data and theory. , 1994, Psychological bulletin.

[28]  E. Capaldi The sequential view: From rapidly fading stimulus traces to the organization of memory and the abstract concept of number , 1994, Psychonomic bulletin & review.

[29]  B. McNaughton,et al.  Reactivation of hippocampal ensemble memories during sleep. , 1994, Science.

[30]  E. Wasserman,et al.  Cue Competition in Causality Judgments: The Role of Nonpresentation of Compound Stimulus Elements , 1994 .

[31]  James L. McClelland,et al.  Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. , 1995, Psychological review.

[32]  Ralph R. Miller,et al.  Trial spacing effects in pavlovian conditioning: A role for local context , 1995 .

[33]  R. Bevins,et al.  One-trial context fear conditioning as a function of the interstimulus interval , 1995 .

[34]  Ralph R. Miller,et al.  Assessment of the Rescorla-Wagner model. , 1995 .

[35]  R. R. Miller,et al.  Biological significance in forward and backward blocking: resolution of a discrepancy between animal conditioning and human causal judgment. , 1996, Journal of experimental psychology. General.

[36]  A. Dickinson,et al.  Within Compound Associations Mediate the Retrospective Revaluation of Causality Judgements , 1996, The Quarterly journal of experimental psychology. B, Comparative and physiological psychology.

[37]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[38]  L. Devenport Spontaneous recovery without interference: Why remembering is adaptive , 1998 .

[39]  A. Blaisdell,et al.  Recovery from blocking achieved by extinguishing the blocking CS , 1999 .

[40]  D. Margoliash,et al.  Song replay during sleep and computational rules for sensorimotor vocal learning. , 2000, Science.

[41]  J. D. McGaugh Memory--a century of consolidation. , 2000, Science.

[42]  J. Pearce,et al.  Theories of associative learning in animals. , 2001, Annual review of psychology.

[43]  M. Mauk,et al.  Latent Acquisition of Timed Responses in Cerebellar Cortex , 2001, The Journal of Neuroscience.

[44]  Marc W. Howard,et al.  A distributed representation of temporal context , 2002 .

[45]  E. Kehoe,et al.  Extinction revisited: Similarities between extinction and reductions in US intensity in classical conditioning of the rabbit’s nictitating membrane response , 2002, Animal learning & behavior.

[46]  R. Rescorla Spontaneous recovery. , 2004, Learning & memory.

[47]  Andrew W. Moore,et al.  Prioritized sweeping: Reinforcement learning with less data and less time , 2004, Machine Learning.

[48]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[49]  Klaus G. Melchers,et al.  Within-compound associations in retrospective revaluation and in direct learning: a challenge for comparator theory. , 2004, The Quarterly journal of experimental psychology. B, Comparative and physiological psychology.

[50]  D. Kibler,et al.  Instance-based learning algorithms , 2004, Machine Learning.

[51]  S. Ghirlanda Retrospective revaluation as simple associative learning. , 2005, Journal of experimental psychology. Animal behavior processes.

[52]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[53]  A. David Redish,et al.  Hippocampal replay contributes to within session learning in a temporal difference reinforcement learning model , 2005, Neural Networks.

[54]  R. Stickgold Sleep-dependent memory consolidation , 2005, Nature.

[55]  Justin A. Harris,et al.  Elemental Representations of Stimuli in Associative Learning , 2022 .

[56]  Aaron C. Courville,et al.  The rat as particle filter , 2007, NIPS.

[57]  R. Stickgold,et al.  Sleep-dependent memory consolidation and reconsolidation. , 2007, Sleep medicine.

[58]  D. R. Euston,et al.  Fast-Forward Playback of Recent Memory Sequences in Prefrontal Cortex During Sleep , 2007, Science.

[59]  Richard S. Sutton,et al.  Stimulus Representation and the Timing of Reward-Prediction Errors in Models of the Dopamine System , 2008, Neural Computation.

[60]  Richard S. Sutton,et al.  Sample-based learning and search with permanent and transient memories , 2008, ICML '08.

[61]  Richard S. Sutton,et al.  A computational model of hippocampal function in trace conditioning , 2008, NIPS.

[62]  W. Pan,et al.  Tripartite Mechanism of Extinction Suggested by Dopamine Neuron Activity and Temporal Difference Model , 2008, The Journal of Neuroscience.

[63]  G. Urcelay,et al.  Pavlovian backward conditioned inhibition in humans: Summation and retardation tests , 2008, Behavioural Processes.

[64]  Daniel A. Gottlieb Is the number of trials a primary determinant of conditioned responding? , 2008, Journal of experimental psychology. Animal behavior processes.

[65]  Alborz Geramifard,et al.  Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping , 2008, UAI.

[66]  Ralph R. Miller,et al.  Spontaneous recovery of excitation and inhibition. , 2009, Journal of experimental psychology. Animal behavior processes.

[67]  Ut Na Sio,et al.  Does incubation enhance problem solving? A meta-analytic review. , 2009, Psychological bulletin.

[68]  Matthew A. Wilson,et al.  Hippocampal Replay of Extended Experience , 2009, Neuron.

[69]  D. Blei,et al.  Context, learning, and extinction. , 2010, Psychological review.

[70]  Eduardo Alonso,et al.  Computational Neuroscience for Advancing Artificial Intelligence: Models, Methods and Applications , 2010 .

[71]  B. Ross The Psychology of Learning and Motivation: Advances in Research and Theory , 2010 .

[72]  Matthijs A. A. van der Meer,et al.  Hippocampal Replay Is Not a Simple Function of Experience , 2010, Neuron.

[73]  B. Balleine,et al.  Human and Rodent Homologies in Action Control: Corticostriatal Determinants of Goal-Directed and Habitual Action , 2010, Neuropsychopharmacology.

[74]  Randall K. Jamieson,et al.  A memory-based account of retrospective revaluation. , 2010, Canadian journal of experimental psychology = Revue canadienne de psychologie experimentale.

[75]  R. Rescorla,et al.  Within-subject effects of number of trials in rat conditioning procedures. , 2010, Journal of experimental psychology. Animal behavior processes.

[76]  P. I. Pavlov Conditioned reflexes: An investigation of the physiological activity of the cerebral cortex. , 1929, Annals of Neurosciences.

[77]  C. Gallistel,et al.  Time and Associative Learning. , 2010, Comparative cognition & behavior reviews.

[78]  Marc G. Bellemare,et al.  A primer on reinforcement learning in the brain : Psychological, computational, and neural perspectives , 2011 .

[79]  P. Dayan,et al.  Model-based influences on humans’ choices and striatal prediction errors , 2011, Neuron.

[80]  Y. Niv,et al.  Exploring a latent cause theory of classical conditioning , 2012, Learning & Behavior.

[81]  L. Frank,et al.  Awake Hippocampal Sharp-Wave Ripples Support Spatial Memory , 2012, Science.

[82]  Elliot A. Ludvig,et al.  Evaluating the TD model of classical conditioning , 2012, Learning & behavior.

[83]  Daniel Bendor,et al.  Biasing the content of hippocampal replay during sleep , 2012, Nature Neuroscience.

[84]  Ralph R. Miller,et al.  Preventing return of fear in an animal model of anxiety: additive effects of massive extinction and extinction in multiple contexts. , 2013, Behavior therapy.

[85]  R. Henson,et al.  Awake reactivation predicts memory in humans , 2013, Proceedings of the National Academy of Sciences.

[86]  Lorena Deuker,et al.  Memory Consolidation by Replay of Stimulus-Specific Neural Activity , 2013, The Journal of Neuroscience.

[87]  P. Dayan,et al.  Goals and Habits in the Brain , 2013, Neuron.

[88]  A. Markman,et al.  Journal of Experimental Psychology : General Retrospective Revaluation in Sequential Decision Making : A Tale of Two Systems , 2012 .

[89]  Rich Sutton,et al.  A Deeper Look at Planning as Learning from Replay , 2015, ICML.

[90]  N. Daw,et al.  Integrating memories to guide decisions , 2015, Current Opinion in Behavioral Sciences.

[91]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[92]  P. Dayan,et al.  Adaptive integration of habits into depth-limited planning defines a habitual-goal–directed spectrum , 2016, Proceedings of the National Academy of Sciences.

[93]  James L. McClelland,et al.  What Learning Systems do Intelligent Agents Need? Complementary Learning Systems Theory Updated , 2016, Trends in Cognitive Sciences.

[94]  Amitai Shenhav,et al.  Habits without Values , 2016 .

[95]  Magnus Enquist,et al.  The power of associative learning and the ontogeny of optimal behaviour , 2016, Royal Society Open Science.

[96]  Alcino J. Silva,et al.  A shared neural ensemble links distinct contextual memories encoded close in time , 2016, Nature.

[97]  M. Botvinick,et al.  The successor representation in human reinforcement learning , 2016, Nature Human Behaviour.

[98]  VWXYZẐZỲ aZ /'+ , 2018, Numerical Methods for Engineers and Scientists.