Time representation in reinforcement learning models of the basal ganglia

Reinforcement learning (RL) models have been influential in understanding many aspects of basal ganglia function, from reward prediction to action selection. Time plays an important role in these models, but there is still no theoretical consensus about what kind of time representation is used by the basal ganglia. We review several theoretical accounts and their supporting evidence. We then discuss the relationship between RL models and the timing mechanisms that have been attributed to the basal ganglia. We hypothesize that a single computational system may underlie both RL and interval timing—the perception of duration in the range of seconds to hours. This hypothesis, which extends earlier models by incorporating a time-sensitive action selection mechanism, may have important implications for understanding disorders like Parkinson's disease in which both decision making and timing are impaired.

[1]  R. Rescorla,et al.  A theory of Pavlovian conditioning : Variations in the effectiveness of reinforcement and nonreinforcement , 1972 .

[2]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[3]  N. Daw,et al.  Dissociating hippocampal and striatal contributions to sequential prediction learning , 2012, The European journal of neuroscience.

[4]  T. Maia Reinforcement learning, conditioning, and the brain: Successes and challenges , 2009, Cognitive, affective & behavioral neuroscience.

[5]  Elliot A. Ludvig,et al.  Timescale invariance in the pacemaker-accumulator family of timing models , 2013 .

[6]  Marc G. Bellemare,et al.  A primer on reinforcement learning in the brain : Psychological, computational, and neural perspectives , 2011 .

[7]  RuppinEytan,et al.  Actor-critic models of the basal ganglia , 2002 .

[8]  R. Church,et al.  Bisection of temporal intervals. , 1977, Journal of experimental psychology. Animal behavior processes.

[9]  Richard S. Sutton,et al.  Stimulus Representation and the Timing of Reward-Prediction Errors in Models of the Dopamine System , 2008, Neural Computation.

[10]  R. Ivry,et al.  The neural representation of time , 2004, Current Opinion in Neurobiology.

[11]  C. Wynne,et al.  Effects of D-amphetamine on temporal discrimination in pigeons , 2005, Behavioural pharmacology.

[12]  Richard S. Sutton,et al.  A computational model of hippocampal function in trace conditioning , 2008, NIPS.

[13]  P. Glimcher,et al.  Midbrain Dopamine Neurons Encode a Quantitative Reward Prediction Error Signal , 2005, Neuron.

[14]  R M Church,et al.  Scalar Timing in Memory , 1984, Annals of the New York Academy of Sciences.

[15]  Marc W. Howard,et al.  A Scale-Invariant Internal Representation of Time , 2012, Neural Computation.

[16]  Richard B. Ivry,et al.  Comparison of patients with Parkinson’s disease or cerebellar lesions in the production of periodic movements involving event-based or emergent timing , 2005, Brain and Cognition.

[17]  Warren H. Meck,et al.  Affinity for the dopamine D2 receptor predicts neuroleptic potency in decreasing the speed of an internal clock , 1986, Pharmacology Biochemistry and Behavior.

[18]  Masaki Tanaka,et al.  [Neural representation of time]. , 2013, Brain and nerve = Shinkei kenkyu no shinpo.

[19]  T. Rammsayer,et al.  On dopaminergic modulation of temporal information processing , 1993, Biological Psychology.

[20]  D. Buonomano,et al.  Population clocks: motor timing with neural dynamics , 2010, Trends in Cognitive Sciences.

[21]  J. Gibbon Scalar expectancy theory and Weber's law in animal timing. , 1977 .

[22]  W. Schultz Midbrain Dopamine Neurons , 2009 .

[23]  Richard S. Sutton,et al.  Time-Derivative Models of Pavlovian Reinforcement , 1990 .

[24]  Elliot A. Ludvig,et al.  Evaluating the TD model of classical conditioning , 2012, Learning & behavior.

[25]  Michael X. Cohen,et al.  Neurocomputational models of basal ganglia function in learning, memory and choice , 2009, Behavioural Brain Research.

[26]  Joel L. Davis,et al.  A Model of How the Basal Ganglia Generate and Use Neural Signals That Predict Reinforcement , 1994 .

[27]  M. Bateson,et al.  Single-trials analyses demonstrate that increases in clock speed contribute to the methamphetamine-induced horizontal shifts in peak-interval timing functions , 2006, Psychopharmacology.

[28]  Josiah R. Boivin,et al.  A Causal Link Between Prediction Errors, Dopamine Neurons and Learning , 2013, Nature Neuroscience.

[29]  P. Dayan,et al.  A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[30]  S. Roberts,et al.  Isolation of an internal clock. , 1981, Journal of experimental psychology. Animal behavior processes.

[31]  Stephen Grossberg,et al.  Neural dynamics of adaptive timing and temporal discrimination during associative learning , 1989, Neural Networks.

[32]  W. Schultz,et al.  A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task , 1999, Neuroscience.

[33]  P. Redgrave,et al.  What is reinforced by phasic dopamine signals? , 2008, Brain Research Reviews.

[34]  J. Horvitz,et al.  Effects of dopamine antagonists on the timing of two intervals , 2003, Pharmacology Biochemistry and Behavior.

[35]  Marjan Jahanshahi,et al.  The substantia nigra, the basal ganglia, dopamine and temporal processing. , 2009, Journal of neural transmission. Supplementum.

[36]  A. Machado Learning the temporal dynamics of behavior. , 1997, Psychological review.

[37]  R. Church,et al.  The differential effects of haloperidol and methamphetamine on time estimation in the rat , 2004, Psychopharmacology.

[38]  W. Meck,et al.  Neuroanatomical and Neurochemical Substrates of Timing , 2011, Neuropsychopharmacology.

[39]  K. Lange,et al.  Subjective time estimation in Parkinson's disease. , 1995, Journal of neural transmission. Supplementum.

[40]  J. Staddon,et al.  Time and memory: towards a pacemaker-free theory of interval timing. , 1999, Journal of the experimental analysis of behavior.

[41]  W. Meck,et al.  Differential modulation of clock speed by the administration of intermittent versus continuous cocaine. , 2004, Behavioral neuroscience.

[42]  Rosanna Cousins,et al.  Stimulus timing by people with Parkinson’s disease , 2008, Brain and Cognition.

[43]  Ann M Graybiel,et al.  Neural representation of time in cortico-basal ganglia circuits , 2009, Proceedings of the National Academy of Sciences.

[44]  Eytan Ruppin,et al.  Actor-critic models of the basal ganglia: new anatomical and computational perspectives , 2002, Neural Networks.

[45]  R. Church,et al.  Methamphetamine and time estimation. , 1981, Journal of experimental psychology. Animal behavior processes.

[46]  Houeto Jean-Luc [Parkinson's disease]. , 2022, La Revue du praticien.

[47]  W. Meck,et al.  Differential effects of clozapine and haloperidol on interval timing in the supraseconds range , 2005, Psychopharmacology.

[48]  J. Hollerman,et al.  Dopamine neurons report an error in the temporal prediction of reward during learning , 1998, Nature Neuroscience.

[49]  Yoshua Bengio,et al.  Alternative time representation in dopamine models , 2009, Journal of Computational Neuroscience.

[50]  Hugo Merchant,et al.  Neural basis of the perception and estimation of time. , 2013, Annual review of neuroscience.

[51]  W. Newsome,et al.  The temporal precision of reward prediction in dopamine neurons , 2008, Nature Neuroscience.

[52]  C. Gerfen The neostriatal mosaic: multiple levels of compartmental organization in the basal ganglia. , 1992, Annual review of neuroscience.

[53]  Warren H. Meck,et al.  Ketamine “unlocks” the reduced clock-speed effects of cocaine following extended training: Evidence for dopamine–glutamate interactions in timing and time perception , 2007, Neurobiology of Learning and Memory.

[54]  Jonathan D. Cohen,et al.  A Model of Interval Timing by Neural Integration , 2011, The Journal of Neuroscience.

[55]  Elliot A. Ludvig,et al.  Pharmacological manipulations of interval timing using the peak procedure in male C3H mice , 2008, Psychopharmacology.

[56]  M. Jahanshahi,et al.  Time estimation and reproduction is abnormal in Parkinson's disease. , 1992, Brain : a journal of neurology.

[57]  P. Glimcher,et al.  Statistics of midbrain dopamine neuron spike trains in the awake primate. , 2007, Journal of neurophysiology.

[58]  M. Shadlen,et al.  Representation of Time by Neurons in the Posterior Parietal Cortex of the Macaque , 2003, Neuron.

[59]  David S. Touretzky,et al.  Representation and Timing in Theories of the Dopamine System , 2006, Neural Computation.

[60]  J. W. Moore,et al.  Adaptively timed conditioned responses and the cerebellum: A neural network approach , 1989, Biological Cybernetics.

[61]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[62]  J. Staddon,et al.  Interval timing , 2006, Nature Reviews Neuroscience.

[63]  W. Schultz,et al.  Influence of Reward Delays on Responses of Dopamine Neurons , 2008, The Journal of Neuroscience.

[64]  B. Pitt Psychopharmacology , 1968, Mental Health.

[65]  Fuat Balci,et al.  Motivational effects on interval timing in dopamine transporter (DAT) knockdown mice , 2010, Brain Research.

[66]  Sham M. Kakade,et al.  Opponent interactions between serotonin and dopamine , 2002, Neural Networks.

[67]  D. Shohamy,et al.  A Role for the Medial Temporal Lobe in Feedback-Driven Learning: Evidence from Amnesia , 2013, The Journal of Neuroscience.

[68]  Catalin V. Buhusi,et al.  What makes us tick? Functional and neural mechanisms of interval timing , 2005, Nature Reviews Neuroscience.

[69]  J. Gibbon,et al.  Coupled Temporal Memories in Parkinson's Disease: A Dopamine-Related Dysfunction , 1998, Journal of Cognitive Neuroscience.

[70]  Y. Niv Reinforcement learning in the brain , 2009 .

[71]  Thomas V. Wiecki,et al.  Neurocomputational models of motor and cognitive deficits in Parkinson's disease. , 2010, Progress in brain research.

[72]  W. Meck,et al.  Cortico-striatal circuits and interval timing: coincidence detection of oscillatory processes. , 2004, Brain research. Cognitive brain research.

[73]  Hagai Bergman,et al.  Temporal Convergence of Dynamic Cell Assemblies in the Striato-Pallidal Network , 2012, The Journal of Neuroscience.

[74]  L. M. Lieving,et al.  Effects of D-amphetamine in a temporal discrimination procedure: selective changes in timing or rate dependency? , 2002, Journal of the experimental analysis of behavior.

[75]  John N. J. Reynolds,et al.  Dopamine-dependent plasticity of corticostriatal synapses , 2002, Neural Networks.

[76]  Michael X. Cohen,et al.  A Role for Dopamine in Temporal Decision Making and Reward Maximization in Parkinsonism , 2008, The Journal of Neuroscience.

[77]  Christopher Miall,et al.  The Storage of Time Intervals Using Oscillating Neurons , 1989, Neural Computation.

[78]  Hiroyuki Nakahara,et al.  Internal-Time Temporal Difference Model for Neural Value-Based Decision Making , 2010, Neural Computation.

[79]  J A Obeso,et al.  Temporal discrimination is abnormal in Parkinson's disease. , 1992, Brain : a journal of neurology.