Alternative time representation in dopamine models

Dopaminergic neuron activity has been modeled during learning and appetitive behavior, most commonly using the temporal-difference (TD) algorithm. However, a proper representation of elapsed time and of the exact task is usually required for the model to work. Most models use timing elements such as delay-line representations of time that are not biologically realistic for intervals in the range of seconds. The interval-timing literature provides several alternatives. One of them is that timing could emerge from general network dynamics, instead of coming from a dedicated circuit. Here, we present a general rate-based learning model based on long short-term memory (LSTM) networks that learns a time representation when needed. Using a naïve network learning its environment in conjunction with TD, we reproduce dopamine activity in appetitive trace conditioning with a constant CS-US interval, including probe trials with unexpected delays. The proposed model learns a representation of the environment dynamics in an adaptive biologically plausible framework, without recourse to delay lines or other special-purpose circuits. Instead, the model predicts that the task-dependent representation of time is learned by experience, is encoded in ramp-like changes in single-neuron activity distributed across small neural networks, and reflects a temporal integration mechanism resulting from the inherent dynamics of recurrent loops within the network. The model also reproduces the known finding that trace conditioning is more difficult than delay conditioning and that the learned representation of the task can be highly dependent on the types of trials experienced during training. Finally, it suggests that the phasic dopaminergic signal could facilitate learning in the cortex.

[1]  F. Crépel,et al.  Dopaminergic modulation of long-term synaptic plasticity in rat prefrontal neurons. , 2003, Cerebral cortex.

[2]  R. Romo,et al.  Neuronal correlates of parametric working memory in the prefrontal cortex , 1999, Nature.

[3]  P. Balsam,et al.  Timing at the Start of Associative Learning , 2002 .

[4]  David C. Sterratt,et al.  Does Morphology Influence Temporal Plasticity? , 2002, ICANN.

[5]  D. Joel,et al.  The connections of the dopaminergic system with the striatum in rats and primates: an analysis with respect to the functional and compartmental organization of the striatum , 2000, Neuroscience.

[6]  Jonathan D. Cohen,et al.  Computational roles for dopamine in behavioural control , 2004, Nature.

[7]  P. Dayan,et al.  A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[8]  W. Newsome,et al.  The temporal precision of reward prediction in dopamine neurons , 2008, Nature Neuroscience.

[9]  Louis D. Matzel,et al.  The Role of the Hippocampus in Trace Conditioning: Temporal Discontinuity or Task Difficulty? , 2001, Neurobiology of Learning and Memory.

[10]  W. Schultz,et al.  A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task , 1999, Neuroscience.

[11]  J. Driver,et al.  Control of Cognitive Processes: Attention and Performance XVIII , 2000 .

[12]  Markus Diesmann,et al.  A Spiking Neural Network Model of an Actor-Critic Learning Agent , 2009, Neural Computation.

[13]  Razvan V. Florian,et al.  Reinforcement Learning Through Modulation of Spike-Timing-Dependent Synaptic Plasticity , 2007, Neural Computation.

[14]  Bram Bakker,et al.  Reinforcement Learning with Long Short-Term Memory , 2001, NIPS.

[15]  R. Church A concise introduction to scalar timing theory. , 2003 .

[16]  Taketoshi Ono,et al.  Dopamine and ACH involvement in plastic learning by hypothalamic neurons in rats , 1990, Brain Research Bulletin.

[17]  E. Izhikevich Solving the distal reward problem through linkage of STDP and dopamine signaling , 2007, BMC Neuroscience.

[18]  Florentin Wörgötter,et al.  On the equivalence between differential Hebbian and temporal difference learning , 2008 .

[19]  A. Barto Adaptive Critics and the Basal Ganglia , 1995 .

[20]  W. Schultz,et al.  Importance of unpredictability for reward responses in primate dopamine neurons. , 1994, Journal of neurophysiology.

[21]  Gerardo Lafferriere,et al.  An implementation of reinforcement learning based on spike timing dependent plasticity , 2008, Biological Cybernetics.

[22]  W. Schultz,et al.  Discrete Coding of Reward Probability and Uncertainty by Dopamine Neurons , 2003, Science.

[23]  Alessandro Ulrici,et al.  Dorsal premotor areas of nonhuman primate: functional flexibility in time domain , 2005, European Journal of Applied Physiology.

[24]  Richard S. Sutton,et al.  Stimulus Representation and the Timing of Reward-Prediction Errors in Models of the Dopamine System , 2008, Neural Computation.

[25]  K. Doya,et al.  The computational neurobiology of learning and reward , 2006, Current Opinion in Neurobiology.

[26]  W. Pan,et al.  Dopamine Cells Respond to Predicted Events during Classical Conditioning: Evidence for Eligibility Traces in the Reward-Learning Network , 2005, The Journal of Neuroscience.

[27]  T. Ono,et al.  Retrospective and prospective coding for predicted reward in the sensory thalamus , 2001, Nature.

[28]  W. Schultz,et al.  Context-dependent activity in primate striatum reflecting past and future behavioral events. , 1995 .

[29]  John N. J. Reynolds,et al.  Dopamine-dependent plasticity of corticostriatal synapses , 2002, Neural Networks.

[30]  Jonathan D. Cohen,et al.  On the Control of Control: The Role of Dopamine in Regulating Prefrontal Function and Working Memory , 2007 .

[31]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[32]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[33]  K. Doya,et al.  Representation of Action-Specific Reward Values in the Striatum , 2005, Science.

[34]  K. Allen,et al.  Dorsal, ventral, and complete excitotoxic lesions of the hippocampus in rats failed to impair appetitive trace conditioning , 2007, Behavioural Brain Research.

[35]  Yoshua Bengio,et al.  Brain Inspired Reinforcement Learning , 2004, NIPS.

[36]  J. Wickens,et al.  Cellular models of reinforcement. , 1995 .

[37]  Catalin V Buhusi,et al.  Interval timing as an emergent learning property. , 2003, Psychological review.

[38]  Daniel Durstewitz,et al.  Neural representation of interval time , 2004, Neuroreport.

[39]  W H Meck,et al.  Timing for the absence of a stimulus: the gap paradigm reversed. , 2000, Journal of experimental psychology. Animal behavior processes.

[40]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[41]  Kenji Doya,et al.  What are the computations of the cerebellum, the basal ganglia and the cerebral cortex? , 1999, Neural Networks.

[42]  P. Lewis,et al.  Finding the timer , 2002, Trends in Cognitive Sciences.

[43]  John Hopson,et al.  General learning models: Timing without a clock. , 2003 .

[44]  M. Nicolelis,et al.  Decoding of temporal intervals from cortical ensemble activity. , 2008, Journal of neurophysiology.

[45]  P. Goldman-Rakic,et al.  Mnemonic coding of visual space in the monkey's dorsolateral prefrontal cortex. , 1989, Journal of neurophysiology.

[46]  Jonathan D. Cohen,et al.  Prefrontal cortex and flexible cognitive control: rules without symbols. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[47]  E. Vaadia,et al.  Coincident but Distinct Messages of Midbrain Dopamine and Striatal Tonically Active Neurons , 2004, Neuron.

[48]  Cristina Lucchetti,et al.  Time-modulated neuronal activity in the premotor cortex of macaque monkeys , 2001, Experimental Brain Research.

[49]  R. Clark,et al.  Classical conditioning and brain systems: the role of awareness. , 1998, Science.

[50]  C. Gallistel,et al.  Time, rate, and conditioning. , 2000, Psychological review.

[51]  Joshua W. Brown,et al.  How the Basal Ganglia Use Parallel Excitatory and Inhibitory Learning Pathways to Selectively Respond to Unexpected Rewarding Cues , 1999, The Journal of Neuroscience.

[52]  M. Roesch,et al.  Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards , 2007, Nature Neuroscience.

[53]  K. Doya Complementary roles of basal ganglia and cerebellum in learning and motor control , 2000, Current Opinion in Neurobiology.

[54]  W. Schultz,et al.  Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task , 1993, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[55]  R. Romo,et al.  Timing and neural encoding of somatosensory parametric working memory in macaque prefrontal cortex. , 2003, Cerebral cortex.

[56]  U. Karmarkar,et al.  Timing in the Absence of Clocks: Encoding Time in Neural Network States , 2007, Neuron.

[57]  Kenji Doya,et al.  Multiple model-based reinforcement learning explains dopamine neuronal activity , 2007, Neural Networks.

[58]  J. Hollerman,et al.  Dopamine neurons report an error in the temporal prediction of reward during learning , 1998, Nature Neuroscience.

[59]  Jürgen Schmidhuber,et al.  Learning Precise Timing with LSTM Recurrent Networks , 2003, J. Mach. Learn. Res..

[60]  M. Shadlen,et al.  Representation of Time by Neurons in the Posterior Parietal Cortex of the Macaque , 2003, Neuron.

[61]  W. Meck Functional and neural mechanisms of interval timing , 2003 .

[62]  Joel L. Davis,et al.  Adaptive Critics and the Basal Ganglia , 1995 .

[63]  David S. Touretzky,et al.  Representation and Timing in Theories of the Dopamine System , 2006, Neural Computation.

[64]  W. Schultz,et al.  Responses of monkey dopamine neurons during learning of behavioral reactions. , 1992, Journal of neurophysiology.

[65]  W. Schultz,et al.  Neuronal activity in monkey ventral striatum related to the expectation of reward , 1992, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[66]  Richard S. Sutton,et al.  A computational model of hippocampal function in trace conditioning , 2008, NIPS.

[67]  Joel L. Davis,et al.  A Model of How the Basal Ganglia Generate and Use Neural Signals That Predict Reinforcement , 1994 .

[68]  Mark Laubach,et al.  Who's on first? What's on second? The time course of learning in corticostriatal systems , 2005, Trends in Neurosciences.

[69]  John E. Schlerf,et al.  Dedicated and intrinsic models of time perception , 2008, Trends in Cognitive Sciences.

[70]  David S. Touretzky,et al.  Timing and Partial Observability in the Dopamine System , 2002, NIPS.

[71]  Jürgen Schmidhuber,et al.  Learning the Long-Term Structure of the Blues , 2002, ICANN.

[72]  H. Condé,et al.  The role of the pedunculopontine tegmental nucleus in relation to conditioned motor performance in the cat I. Context-dependent and reinforcement-related single unit activity , 1998, Experimental Brain Research.

[73]  E. Vaadia,et al.  Midbrain dopamine neurons encode decisions for future action , 2006, Nature Neuroscience.

[74]  Catalin V. Buhusi,et al.  What makes us tick? Functional and neural mechanisms of interval timing , 2005, Nature Reviews Neuroscience.

[75]  Michael J. Frank,et al.  Making Working Memory Work: A Computational Model of Learning in the Prefrontal Cortex and Basal Ganglia , 2006, Neural Computation.

[76]  G. Miller Learning to Forget , 2004, Science.

[77]  T. Ono,et al.  Neuronal responses in monkey lateral hypothalamus during operant feeding behavior , 1986, Brain Research Bulletin.

[78]  W. Schultz,et al.  Learning of sequential movements by neural network model with dopamine-like reinforcement signal , 1998, Experimental Brain Research.

[79]  Jonathan D. Cohen,et al.  Learning to Use Working Memory in Partially Observable Environments through Dopaminergic Reinforcement , 2008, NIPS.

[80]  Jürgen Schmidhuber,et al.  Learning to forget: continual prediction with LSTM , 1999 .