Conditioning and time representation in long short-term memory networks

Dopaminergic models based on the temporal-difference learning algorithm usually do not differentiate trace from delay conditioning. Instead, they use a fixed temporal representation of elapsed time since conditioned stimulus onset. Recently, a new model was proposed in which timing is learned within a long short-term memory (LSTM) artificial neural network representing the cerebral cortex (Rivest et al. in J Comput Neurosci 28(1):107–130, 2010). In this paper, that model’s ability to reproduce and explain relevant data, as well as its ability to make interesting new predictions, are evaluated. The model reveals a strikingly different temporal representation between trace and delay conditioning since trace conditioning requires working memory to remember the past conditioned stimulus while delay conditioning does not. On the other hand, the model predicts no important difference in DA responses between those two conditions when trained on one conditioning paradigm and tested on the other. The model predicts that in trace conditioning, animal timing starts with the conditioned stimulus offset as opposed to its onset. In classical conditioning, it predicts that if the conditioned stimulus does not disappear after the reward, the animal may expect a second reward. Finally, the last simulation reveals that the buildup of activity of some units in the networks can adapt to new delays by adjusting their rate of integration. Most importantly, the paper shows that it is possible, with the proposed architecture, to acquire discharge patterns similar to those observed in dopaminergic neurons and in the cerebral cortex on those tasks simply by minimizing a predictive cost function.

[1]  Tadashi Yamazaki,et al.  The cerebellum as a liquid state machine , 2007, Neural Networks.

[2]  Charles R. Gallistel,et al.  Memory and the Computational Brain: Why Cognitive Science will Transform Neuroscience , 2009 .

[3]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[4]  C. Gallistel,et al.  Memory and the Computational Brain , 2009 .

[5]  W. Schultz,et al.  Learning of sequential movements by neural network model with dopamine-like reinforcement signal , 1998, Experimental Brain Research.

[6]  Jürgen Schmidhuber,et al.  Learning to forget: continual prediction with LSTM , 1999 .

[7]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[8]  J. Hollerman,et al.  Dopamine neurons report an error in the temporal prediction of reward during learning , 1998, Nature Neuroscience.

[9]  P. Shizgal,et al.  Prolonged rewarding stimulation of the rat medial forebrain bundle: neurochemical and behavioral consequences. , 2006, Behavioral neuroscience.

[10]  Yoshua Bengio,et al.  Adaptive Drift-Diffusion Process to Learn Time Intervals , 2011, 1103.2382.

[11]  M. Nicolelis,et al.  Decoding of temporal intervals from cortical ensemble activity. , 2008, Journal of neurophysiology.

[12]  C. Gallistel,et al.  Time, rate, and conditioning. , 2000, Psychological review.

[13]  M. A. Steinmetz,et al.  Neuronal activity in posterior parietal area 7a during the delay periods of a spatial memory task. , 1996, Journal of neurophysiology.

[14]  Joshua W. Brown,et al.  How the Basal Ganglia Use Parallel Excitatory and Inhibitory Learning Pathways to Selectively Respond to Unexpected Rewarding Cues , 1999, The Journal of Neuroscience.

[15]  W. Schultz,et al.  Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task , 1993, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[16]  R. Romo,et al.  Timing and neural encoding of somatosensory parametric working memory in macaque prefrontal cortex. , 2003, Cerebral cortex.

[17]  U. Karmarkar,et al.  Timing in the Absence of Clocks: Encoding Time in Neural Network States , 2007, Neuron.

[18]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[19]  Richard S. Sutton,et al.  Time-Derivative Models of Pavlovian Reinforcement , 1990 .

[20]  P. Goldman-Rakic,et al.  Mnemonic coding of visual space in the monkey's dorsolateral prefrontal cortex. , 1989, Journal of neurophysiology.

[21]  Dean V Buonomano,et al.  A learning rule for the emergence of stable dynamics and timing in recurrent networks. , 2005, Journal of neurophysiology.

[22]  Jürgen Schmidhuber,et al.  Learning Precise Timing with LSTM Recurrent Networks , 2003, J. Mach. Learn. Res..

[23]  W. Senn,et al.  Climbing Neuronal Activity as an Event-Based Cortical Representation of Time , 2004, The Journal of Neuroscience.

[24]  L. F. Abbott,et al.  Generating Coherent Patterns of Activity from Chaotic Neural Networks , 2009, Neuron.

[25]  Peter Ford Dominey,et al.  Encoding behavioral context in recurrent networks of the fronto-striatal system: a simulation study. , 1997, Brain research. Cognitive brain research.

[26]  E. Vaadia,et al.  Coincident but Distinct Messages of Midbrain Dopamine and Striatal Tonically Active Neurons , 2004, Neuron.

[27]  Cristina Lucchetti,et al.  Time-modulated neuronal activity in the premotor cortex of macaque monkeys , 2001, Experimental Brain Research.

[28]  Daniel Bullock,et al.  A Scalable Model of Cerebellar Adaptive Timing and Sequencing: The Recurrent Slide and Latch (RSL) Model , 2002, Applied Intelligence.

[29]  P. Dayan,et al.  A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[30]  Andre Luzardo,et al.  An adaptive drift-diffusion model of interval timing dynamics , 2013, Behavioural Processes.

[31]  A. Machado Learning the temporal dynamics of behavior. , 1997, Psychological review.

[32]  Louis D. Matzel,et al.  The Role of the Hippocampus in Trace Conditioning: Temporal Discontinuity or Task Difficulty? , 2001, Neurobiology of Learning and Memory.

[33]  W. Meck,et al.  Cortico-striatal circuits and interval timing: coincidence detection of oscillatory processes. , 2004, Brain research. Cognitive brain research.

[34]  J. Gibbon Scalar expectancy theory and Weber's law in animal timing. , 1977 .

[35]  Alessandro Ulrici,et al.  Dorsal premotor areas of nonhuman primate: functional flexibility in time domain , 2005, European Journal of Applied Physiology.

[36]  K. Nakamura,et al.  Lateral hypothalamus neuron involvement in integration of natural and artificial rewards and cue signals. , 1986, Journal of neurophysiology.

[37]  Richard S. Sutton,et al.  Stimulus Representation and the Timing of Reward-Prediction Errors in Models of the Dopamine System , 2008, Neural Computation.

[38]  F. Crépel,et al.  Dopaminergic modulation of long-term synaptic plasticity in rat prefrontal neurons. , 2003, Cerebral cortex.

[39]  M. Shadlen,et al.  Representation of Time by Neurons in the Posterior Parietal Cortex of the Macaque , 2003, Neuron.

[40]  David S. Touretzky,et al.  Representation and Timing in Theories of the Dopamine System , 2006, Neural Computation.

[41]  Catalin V Buhusi,et al.  Interval timing as an emergent learning property. , 2003, Psychological review.

[42]  W. Schultz,et al.  Responses of monkey dopamine neurons during learning of behavioral reactions. , 1992, Journal of neurophysiology.

[43]  Jonathan D. Cohen,et al.  A Model of Interval Timing by Neural Integration , 2011, The Journal of Neuroscience.

[44]  M. Gabriel,et al.  Learning and Computational Neuroscience: Foundations of Adaptive Networks , 1990 .

[45]  K. Allen,et al.  Dorsal, ventral, and complete excitotoxic lesions of the hippocampus in rats failed to impair appetitive trace conditioning , 2007, Behavioural Brain Research.

[46]  R. Romo,et al.  Neuronal correlates of parametric working memory in the prefrontal cortex , 1999, Nature.

[47]  Yoshua Bengio,et al.  Alternative time representation in dopamine models , 2009, Journal of Computational Neuroscience.

[48]  W. Newsome,et al.  The temporal precision of reward prediction in dopamine neurons , 2008, Nature Neuroscience.

[49]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[50]  W H Meck,et al.  Timing for the absence of a stimulus: the gap paradigm reversed. , 2000, Journal of experimental psychology. Animal behavior processes.

[51]  W. Pan,et al.  Dopamine Cells Respond to Predicted Events during Classical Conditioning: Evidence for Eligibility Traces in the Reward-Learning Network , 2005, The Journal of Neuroscience.

[52]  T. Ono,et al.  Retrospective and prospective coding for predicted reward in the sensory thalamus , 2001, Nature.

[53]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[54]  E. Wasserman,et al.  Cyclic responding by pigeons on the peak timing procedure. , 1996, Journal of experimental psychology. Animal behavior processes.

[55]  S. Wise,et al.  Premotor cortex of the rhesus monkey: neuronal activity in anticipation of predictable environmental events , 2004, Experimental Brain Research.

[56]  Christopher Miall,et al.  The Storage of Time Intervals Using Oscillating Neurons , 1989, Neural Computation.

[57]  François Rivest,et al.  Modèle informatique du coapprentissage des ganglions de la base et du cortex :l'apprentissage par renforcement et le développement de représentations , 2010 .

[58]  Geoffrey M. Ghose,et al.  Temporal Production Signals in Parietal Cortex , 2012, PLoS biology.

[59]  David J. Willshaw,et al.  Adaptive leaky integrator models of cerebellar Purkinje cells can learn the clustering of temporal patterns , 1999, Neurocomputing.

[60]  Catalin V. Buhusi,et al.  What makes us tick? Functional and neural mechanisms of interval timing , 2005, Nature Reviews Neuroscience.

[61]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[62]  Michael J. Frank,et al.  Making Working Memory Work: A Computational Model of Learning in the Prefrontal Cortex and Basal Ganglia , 2006, Neural Computation.

[63]  Elliot A. Ludvig,et al.  Magnitude and timing of conditioned responses in delay and trace classical conditioning of the nictitating membrane response of the rabbit (Oryctolagus cuniculus). , 2009, Behavioral neuroscience.

[64]  P. Balsam,et al.  Timing at the Start of Associative Learning , 2002 .

[65]  Peter R. Killeen,et al.  Temporal generalization accounts for response resurgence in the peak procedure , 2007, Behavioural Processes.

[66]  Alex M. Andrew,et al.  ROBOT LEARNING, edited by Jonathan H. Connell and Sridhar Mahadevan, Kluwer, Boston, 1993/1997, xii+240 pp., ISBN 0-7923-9365-1 (Hardback, 218.00 Guilders, $120.00, £89.95). , 1999, Robotica (Cambridge. Print).

[67]  J. Gibbon,et al.  Timing and time perception. , 1984, Annals of the New York Academy of Sciences.

[68]  Florentin Wörgötter,et al.  Mathematical properties of neuronal TD-rules and differential Hebbian learning: a comparison , 2008, Biological Cybernetics.

[69]  Masataka Watanabe,et al.  Prefrontal and cingulate unit activity during timing behavior in the monkey , 1979, Brain Research.

[70]  C. Gallistel,et al.  Acquisition of peak responding: What is learned? , 2009, Behavioural Processes.

[71]  Richard S. Sutton,et al.  A computational model of hippocampal function in trace conditioning , 2008, NIPS.

[72]  W. Schultz,et al.  A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task , 1999, Neuroscience.

[73]  R M Church,et al.  Scalar Timing in Memory , 1984, Annals of the New York Academy of Sciences.

[74]  John E. Schlerf,et al.  Dedicated and intrinsic models of time perception , 2008, Trends in Cognitive Sciences.