Dopamine and inference about timing

Several investigators have suggested that the primate dopamine system carries an error signal for learning to predict future rewards. These models, based on temporal-difference (TD) learning, explain most phasic responses of primate dopamine neurons in appetitive conditioning; moreover, they suggest a neurophysiological account of animal conditioning behavior. But because existing models are based in the simple formal setting of Markov processes, they are deficient in at least two areas relevant to physiological and behavioral data. They do not provide a realistic account of the partial observability of the state of the world, nor of how the system tracks the timing of events. In this paper, we introduce a version of TD learning grounded in a richer formal model to better address both issues and, consequently, to explain some data that challenge existing models.

[1]  Yann Guédon,et al.  Explicit state occupancy modelling by hidden semi-Markov models: application of Derin's scheme , 1990 .

[2]  David S. Touretzky,et al.  Modeling Temporal Structure in Classical Conditioning , 2001, NIPS.

[3]  Joel L. Davis,et al.  A Model of How the Basal Ganglia Generate and Use Neural Signals That Predict Reinforcement , 1994 .

[4]  J. Hollerman,et al.  Dopamine neurons report an error in the temporal prediction of reward during learning , 1998, Nature Neuroscience.

[5]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[6]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[7]  J. Gibbon Scalar expectancy theory and Weber's law in animal timing. , 1977 .

[8]  C. Gallistel,et al.  Time, rate, and conditioning. , 2000, Psychological review.

[9]  Lonnie Chrisman,et al.  Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.

[10]  Michael O. Duff,et al.  Reinforcement Learning Methods for Continuous-Time Markov Decision Problems , 1994, NIPS.

[11]  W. Schultz,et al.  A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task , 1999, Neuroscience.

[12]  P. Dayan,et al.  A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.