Dopamine neurons report an error in the temporal prediction of reward during learning

Many behaviors are affected by rewards, undergoing long-term changes when rewards are different than predicted but remaining unchanged when rewards occur exactly as predicted. The discrepancy between reward occurrence and reward prediction is termed an 'error in reward prediction'. Dopamine neurons in the substantia nigra and the ventral tegmental area are believed to be involved in reward-dependent behaviors. Consistent with this role, they are activated by rewards, and because they are activated more strongly by unpredicted than by predicted rewards they may play a role in learning. The present study investigated whether monkey dopamine neurons code an error in reward prediction during the course of learning. Dopamine neuron responses reflected the changes in reward prediction during individual learning episodes; dopamine neurons were activated by rewards during early trials, when errors were frequent and rewards unpredictable, but activation was progressively reduced as performance was consolidated and rewards became more predictable. These neurons were also activated when rewards occurred at unpredicted times and were depressed when rewards were omitted at the predicted times. Thus, dopamine neurons code errors in the prediction of both the occurrence and the time of rewards. In this respect, their responses resemble the teaching signals that have been employed in particularly efficient computational learning models.

[1]  M. C. Smith,et al.  CS-US interval and US intensity in classical conditioning of the rabbit's nictitating membrane response. , 1968, Journal of comparative and physiological psychology.

[2]  W. F. Prokasy,et al.  Classical conditioning II: Current research and theory. , 1972 .

[3]  N. Mackintosh A Theory of Attention: Variations in the Associability of Stimuli with Reinforcement , 1975 .

[4]  N. Mackintosh,et al.  Surprise and the attenuation of blocking. , 1976 .

[5]  Masataka Watanabe,et al.  Prefrontal and cingulate unit activity during timing behavior in the monkey , 1979, Brain Research.

[6]  J. Pearce,et al.  A model for Pavlovian learning: variations in the effectiveness of conditioned but not of unconditioned stimuli. , 1980, Psychological review.

[7]  J. Pearce,et al.  A model for Pavlovian learning: Variations in the effectiveness of conditioned but not of unconditioned stimuli. , 1980 .

[8]  A. Dickinson Contemporary Animal Learning Theory , 1981 .

[9]  F. Bloom,et al.  Nonrepinephrine-containing locus coeruleus neurons in behaving rats exhibit pronounced responses to non-noxious environmental stimuli , 1981, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[10]  A G Barto,et al.  Toward a modern theory of adaptive networks: expectation and prediction. , 1981, Psychological review.

[11]  R. Wise Neuroleptics and operant behavior: The anhedonia hypothesis , 1982, Behavioral and Brain Sciences.

[12]  B. Jacobs,et al.  Behavioral correlates of dopaminergic unit activity in freely moving cats , 1983, Brain Research.

[13]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[14]  M. Delong,et al.  Nucleus basalis of Meynert neuronal activity during a delayed response task in monkey , 1986, Brain Research.

[15]  D. Gaffan,et al.  Disconnection of the amygdala from visual association cortex impairs visual reward-association learning in monkeys , 1988, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[16]  T. Ono,et al.  Topographic distribution of modality-specific amygdalar neurons in alert monkey , 1988, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[17]  O. Hikosaka,et al.  Functional properties of monkey caudate neurons. III. Activities related to expectation of target and reward. , 1989, Journal of neurophysiology.

[18]  Masataka Watanabe,et al.  The appropriateness of behavioral responses coded in post-trial activity of primate prefrontal units , 1989, Neuroscience Letters.

[19]  W. Schultz,et al.  Dopamine neurons of the monkey midbrain: contingencies of responses to stimuli eliciting immediate behavioral reactions. , 1990, Journal of neurophysiology.

[20]  E. Rolls,et al.  Neuronal responses related to reinforcement in the primate basal forebrain , 1990, Brain Research.

[21]  M. Segal,et al.  Plasticity of sensory responses of locus coeruleus neurons in the behaving rat: implications for cognition. , 1991, Progress in brain research.

[22]  W. Schultz,et al.  Responses of monkey dopamine neurons during learning of behavioral reactions. , 1992, Journal of neurophysiology.

[23]  O. Hikosaka,et al.  Visual and oculomotor functions of monkey subthalamic nucleus. , 1992, Journal of neurophysiology.

[24]  P. Calabresi,et al.  Long-term synaptic depression in the striatum: physiological and pharmacological characterization , 1992, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[25]  W. Schultz,et al.  Neuronal activity in monkey ventral striatum related to the expectation of reward , 1992, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[26]  Edmund T. Rolls,et al.  Neuronal responses in the ventral striatum of the behaving macaque , 1993, Behavioural Brain Research.

[27]  W. Schultz,et al.  Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task , 1993, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[28]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[29]  G. Aston-Jones,et al.  Locus coeruleus neurons in monkey are selectively activated by attended cues in a vigilance task , 1994, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[30]  W. Schultz,et al.  Importance of unpredictability for reward responses in primate dopamine neurons. , 1994, Journal of neurophysiology.

[31]  A. Graybiel,et al.  Responses of tonically active neurons in the primate's striatum undergo systematic changes during behavioral sensorimotor conditioning , 1994, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[32]  Joel L. Davis,et al.  Adaptive Critics and the Basal Ganglia , 1995 .

[33]  Peter Dayan,et al.  Bee foraging in uncertain environments using predictive hebbian learning , 1995, Nature.

[34]  W. Schultz,et al.  Preferential activation of midbrain dopamine neurons by appetitive rather than aversive stimuli , 1996, Nature.

[35]  J. Wickens,et al.  Dopamine reverses the depression of rat corticostriatal synapses which normally follows high-frequency stimulation of cortex In vitro , 1996, Neuroscience.

[36]  P. Dayan,et al.  A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[37]  B. Richmond,et al.  Neural signals in the monkey ventral striatum related to motivation for juice and cocaine rewards. , 1996, Journal of neurophysiology.

[38]  Masataka Watanabe Reward expectancy in primate prefrental neurons , 1996, Nature.

[39]  T. Robbins,et al.  Neurobehavioural mechanisms of reward and motivation , 1996, Current Opinion in Neurobiology.

[40]  P. Calabresi,et al.  Abnormal Synaptic Plasticity in the Striatum of Mice Lacking Dopamine D2 Receptors , 1997, The Journal of Neuroscience.

[41]  J. Horvitz,et al.  Burst activity of ventral tegmental dopamine neurons is elicited by sensory stimuli in the awake cat , 1997, Brain Research.

[42]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[43]  Eric Legallet,et al.  Responses of tonically discharging neurons in the monkey striatum to primary rewards delivered during different behavioral states , 1997, Experimental Brain Research.

[44]  W. Schultz,et al.  Learning of sequential movements by neural network model with dopamine-like reinforcement signal , 1998, Experimental Brain Research.