Evaluating the TD model of classical conditioning

The temporal-difference (TD) algorithm from reinforcement learning provides a simple method for incrementally learning predictions of upcoming events. Applied to classical conditioning, TD models suppose that animals learn a real-time prediction of the unconditioned stimulus (US) on the basis of all available conditioned stimuli (CSs). In the TD model, similar to other error-correction models, learning is driven by prediction errors—the difference between the change in US prediction and the actual US. With the TD model, however, learning occurs continuously from moment to moment and is not artificially constrained to occur in trials. Accordingly, a key feature of any TD model is the assumption about the representation of a CS on a moment-to-moment basis. Here, we evaluate the performance of the TD model with a heretofore unexplored range of classical conditioning tasks. To do so, we consider three stimulus representations that vary in their degree of temporal generalization and evaluate how the representation influences the performance of the TD model on these conditioning tasks.

[1]  M. D. Egger,et al.  Secondary reinforcement in rats as a function of information value and reliability of the stimulus. , 1962, Journal of experimental psychology.

[2]  David Elkind,et al.  Learning: An Introduction , 1968 .

[3]  M. C. Smith,et al.  CS-US interval and US intensity in classical conditioning of the rabbit's nictitating membrane response. , 1968, Journal of comparative and physiological psychology.

[4]  S. R. Coleman,et al.  Classical conditioning of the rabbit's nictitating membrane response at backward, simultaneous, and forward CS-US intervals. , 1969, Journal of comparative and physiological psychology.

[5]  R. Rescorla A theory of pavlovian conditioning: The effectiveness of reinforcement and non-reinforcement , 1972 .

[6]  P. W. Frey,et al.  Model of conditioning incorporating the Rescorla-Wagner associative axiom, a dynamic attention process, and a catastrophe rule. , 1978 .

[7]  J. Pearce,et al.  A model for Pavlovian learning: Variations in the effectiveness of conditioned but not of unconditioned stimuli. , 1980 .

[8]  A G Barto,et al.  Toward a modern theory of adaptive networks: expectation and prediction. , 1981, Psychological review.

[9]  S. Roberts,et al.  Isolation of an internal clock. , 1981, Journal of experimental psychology. Animal behavior processes.

[10]  E. Kehoe,et al.  Blocking acquisition of the rabbit's nictitating membrane response to serial conditioned stimuli , 1981 .

[11]  Schreurs Bg,et al.  The effects of changes in the CS-US interval during compound conditioning upon an other wise blocked element , 1982 .

[12]  R. F. Westbrook,et al.  The Effects of Changes in the CS-US Interval during Compound Conditioning upon an Other Wise Blocked Element , 1982, The Quarterly journal of experimental psychology. B, Comparative and physiological psychology.

[13]  R. Sutton,et al.  Simulation of the classically conditioned nictitating membrane response by a neuron-like adaptive element: Response topography, neuronal firing, and interstimulus intervals , 1986, Behavioural Brain Research.

[14]  J. Pearce A model for stimulus generalization in Pavlovian conditioning. , 1987, Psychological review.

[15]  E. Kehoe,et al.  Temporal primacy overrides prior training in serial compound conditioning of the rabbit’s nictitating membrane response , 1987 .

[16]  R. C. Honey,et al.  Acquired equivalence and distinctiveness of cues. , 1989, Journal of experimental psychology. Animal behavior processes.

[17]  Stephen Grossberg,et al.  Neural dynamics of adaptive timing and temporal discrimination during associative learning , 1989, Neural Networks.

[18]  Richard S. Sutton,et al.  Time-Derivative Models of Pavlovian Reinforcement , 1990 .

[19]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[20]  Richard S. Sutton,et al.  Adapting Bias by Gradient Descent: An Incremental Version of Delta-Bar-Delta , 1992, AAAI.

[21]  J. Pearce Similarity and discrimination: a selective review and a connectionist model. , 1994, Psychological review.

[22]  P. Dayan,et al.  A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[23]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[24]  A. Machado Learning the temporal dynamics of behavior. , 1997, Psychological review.

[25]  JOHN W. Moore,et al.  The TD Model of Classical Conditioning: Response Topography and Brain Implementation , 1997 .

[26]  W. Schultz,et al.  A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task , 1999, Neuroscience.

[27]  Joshua W. Brown,et al.  How the Basal Ganglia Use Parallel Excitatory and Inhibitory Learning Pathways to Selectively Respond to Unexpected Rewarding Cues , 1999, The Journal of Neuroscience.

[28]  C. Buhusi,et al.  Timing in simple conditioning and occasion setting: a neural network approach , 1999, Behavioural Processes.

[29]  E. Kehoe,et al.  Extinction revisited: Similarities between extinction and reductions in US intensity in classical conditioning of the rabbit’s nictitating membrane response , 2002, Animal learning & behavior.

[30]  Edgar H Vogel,et al.  Stimulus representation in SOP: I Theoretical rationalization and some implications , 2003, Behavioural Processes.

[31]  Edgar H Vogel,et al.  Stimulus representation in SOP: II. An application to inhibition of delay , 2003, Behavioural Processes.

[32]  J. W. Moore,et al.  Adaptive timing in neural networks: The conditioned response , 1988, Biological Cybernetics.

[33]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[34]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[35]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[36]  P. Couvillon,et al.  Temporal control of conditioned responding in goldfish. , 2005, Journal of experimental psychology. Animal behavior processes.

[37]  Kimberly S. Kirkpatrick,et al.  Interval duration effects on blocking in appetitive conditioning , 2006, Behavioural Processes.

[38]  David S. Touretzky,et al.  Representation and Timing in Theories of the Dopamine System , 2006, Neural Computation.

[39]  W. Schultz Behavioral theories and the neurophysiology of reward. , 2006, Annual review of psychology.

[40]  Kimberly S. Kirkpatrick,et al.  Overshadowing and stimulus duration. , 2007, Journal of experimental psychology. Animal behavior processes.

[41]  G. McNally,et al.  Temporal-difference prediction errors and Pavlovian fear conditioning: role of NMDA and opioid receptors. , 2007, Behavioral neuroscience.

[42]  Anna Koop,et al.  Learning to Generalize through Predictive Representations: A Computational Model of Mediated Conditioning , 2008, SAB.

[43]  Ralph R. Miller,et al.  CS-US temporal relations in blocking , 2008, Learning & behavior.

[44]  Richard S. Sutton,et al.  Stimulus Representation and the Timing of Reward-Prediction Errors in Models of the Dopamine System , 2008, Neural Computation.

[45]  Elliot A. Ludvig,et al.  Magnitude and timing of nictitating membrane movements during classical conditioning of the rabbit (Oryctolagus cuniculus). , 2008, Behavioral neuroscience.

[46]  Richard S. Sutton,et al.  A computational model of hippocampal function in trace conditioning , 2008, NIPS.

[47]  W. Pan,et al.  Tripartite Mechanism of Extinction Suggested by Dopamine Neuron Activity and Temporal Difference Model , 2008, The Journal of Neuroscience.

[48]  Kirk N. Olsen,et al.  Scalar timing varies with response magnitude in classical conditioning of the nictitating membrane response of the rabbit (Oryctolagus cuniculus). , 2009, Behavioral neuroscience.

[49]  Y. Niv Reinforcement learning in the brain , 2009 .

[50]  T. Maia Reinforcement learning, conditioning, and the brain: Successes and challenges , 2009, Cognitive, affective & behavioral neuroscience.

[51]  Ann M Graybiel,et al.  Neural representation of time in cortico-basal ganglia circuits , 2009, Proceedings of the National Academy of Sciences.

[52]  Elliot A. Ludvig,et al.  Magnitude and timing of conditioned responses in delay and trace classical conditioning of the nictitating membrane response of the rabbit (Oryctolagus cuniculus). , 2009, Behavioral neuroscience.

[53]  C. Gallistel,et al.  Memory and the Computational Brain , 2009 .

[54]  Eduardo Alonso,et al.  Computational Neuroscience for Advancing Artificial Intelligence: Models, Methods and Applications , 2010 .

[55]  H. Eichenbaum,et al.  Hippocampal “Time Cells” Bridge the Gap in Memory for Discontiguous Events , 2011, Neuron.

[56]  Marc G. Bellemare,et al.  A primer on reinforcement learning in the brain : Psychological, computational, and neural perspectives , 2011 .