Influence of Reward Delays on Responses of Dopamine Neurons

Psychological and microeconomic studies have shown that outcome values are discounted by imposed delays. The effect, called temporal discounting, is demonstrated typically by choice preferences for sooner smaller rewards over later larger rewards. However, it is unclear whether temporal discounting occurs during the decision process when differently delayed reward outcomes are compared or during predictions of reward delays by pavlovian conditioned stimuli without choice. To address this issue, we investigated the temporal discounting behavior in a choice situation and studied the effects of reward delay on the value signals of dopamine neurons. The choice behavior confirmed hyperbolic discounting of reward value by delays on the order of seconds. Reward delay reduced the responses of dopamine neurons to pavlovian conditioned stimuli according to a hyperbolic decay function similar to that observed in choice behavior. Moreover, the stimulus responses increased with larger reward magnitudes, suggesting that both delay and magnitude constituted viable components of dopamine value signals. In contrast, dopamine responses to the reward itself increased with longer delays, possibly reflecting temporal uncertainty and partial learning. These dopamine reward value signals might serve as useful inputs for brain mechanisms involved in economic choices between delayed rewards.

[1]  P. Samuelson Some Aspects of the Pure Theory of Capital , 1937 .

[2]  G. Ainslie,et al.  Impulse control in pigeons. , 1974, Journal of the experimental analysis of behavior.

[3]  P. Holland CS-US interval as a determinant of the form of Pavlovian appetitive conditioned responses. , 1980, Journal of experimental psychology. Animal behavior processes.

[4]  R. Herrnstein,et al.  Preference reversal and delayed reinforcement , 1981 .

[5]  [General principles of caring for comatose children with multiple injuries]. , 1986, Viata medicala; revista de informare profesionala si stiintifica a cadrelor medii sanitare.

[6]  J. Kagel,et al.  When foragers discount the future: constraint or adaptation? , 1986, Animal Behaviour.

[7]  W. Schultz,et al.  Responses of nigrostriatal dopamine neurons to high-intensity somatosensory stimulation in the anesthetized monkey. , 1987, Journal of neurophysiology.

[8]  J. E. Mazur An adjusting procedure for studying delayed reinforcement. , 1987 .

[9]  A. Logue,et al.  Adjusting delay to reinforcement: comparing choice in pigeons and humans. , 1988, Journal of experimental psychology. Animal behavior processes.

[10]  W. Schultz,et al.  Dopamine neurons of the monkey midbrain: contingencies of responses to stimuli eliciting immediate behavioral reactions. , 1990, Journal of neurophysiology.

[11]  W. Schultz,et al.  Responses of monkey midbrain dopamine neurons during delayed alternation performance , 1991, Brain Research.

[12]  W. Schultz,et al.  Responses of monkey dopamine neurons during learning of behavioral reactions. , 1992, Journal of neurophysiology.

[13]  Joel L. Davis,et al.  A Model of How the Basal Ganglia Generate and Use Neural Signals That Predict Reinforcement , 1994 .

[14]  L. Green,et al.  Discounting of delayed rewards: Models of individual choice. , 1995, Journal of the experimental analysis of behavior.

[15]  P. Dayan,et al.  A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[16]  H. de Wit,et al.  Determination of discount functions in rats with an adjusting-amount procedure. , 1997, Journal of the experimental analysis of behavior.

[17]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[18]  W. Schultz,et al.  A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task , 1999, Neuroscience.

[19]  J. E. Mazur Tradeoffs among delay, rate, and amount of reinforcement , 2000, Behavioural Processes.

[20]  T S Critchfield,et al.  Temporal discounting: basic research and the analysis of socially important behavior. , 2001, Journal of applied behavior analysis.

[21]  W. Schultz,et al.  Dopamine responses comply with basic assumptions of formal learning theory , 2001, Nature.

[22]  T. Robbins,et al.  Impulsive Choice Induced in Rats by Lesions of the Nucleus Accumbens Core , 2001, Science.

[23]  Peter Dayan,et al.  Dopamine: generalization and bonuses , 2002, Neural Networks.

[24]  W. Schultz,et al.  Coding of Predicted Reward Omission by Dopamine Neurons in a Conditioned Inhibition Paradigm , 2003, The Journal of Neuroscience.

[25]  L. Green,et al.  Preference reversals with food and water reinforcers in rats. , 2003, Journal of the experimental analysis of behavior.

[26]  W. Schultz,et al.  Discrete Coding of Reward Probability and Uncertainty by Dopamine Neurons , 2003, Science.

[27]  Saori C. Tanaka,et al.  Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops , 2004, Nature Neuroscience.

[28]  J. Deakin,et al.  Effects of orbital prefrontal cortex dopamine depletion on inter-temporal choice: a quantitative analysis , 2004, Psychopharmacology.

[29]  Samuel M. McClure,et al.  Separate Neural Systems Value Immediate and Delayed Monetary Rewards , 2004, Science.

[30]  L. Green,et al.  A discounting framework for choice with delayed and probabilistic rewards. , 2004, Psychological bulletin.

[31]  T. Robbins,et al.  Contrasting Roles of Basolateral Amygdala and Orbitofrontal Cortex in Impulsive Choice , 2004, The Journal of Neuroscience.

[32]  W. Schultz,et al.  Adaptive Coding of Reward Value by Dopamine Neurons , 2005, Science.

[33]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[34]  H. Seung,et al.  JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR 2005, 84, 581–617 NUMBER 3(NOVEMBER) LINEAR-NONLINEAR-POISSON MODELS OF PRIMATE CHOICE DYNAMICS , 2022 .

[35]  Okihide Hikosaka,et al.  Functional differences between macaque prefrontal cortex and caudate nucleus during eye movements with and without reward , 2006, Experimental Brain Research.

[36]  David S. Touretzky,et al.  Representation and Timing in Theories of the Dopamine System , 2006, Neural Computation.

[37]  Benjamin Y. Hayden,et al.  Temporal Discounting Predicts Risk Sensitivity in Rhesus Macaques , 2007, Current Biology.

[38]  M. Roesch,et al.  Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards , 2007, Nature Neuroscience.

[39]  P. Glimcher,et al.  The neural correlates of subjective value during intertemporal choice , 2007, Nature Neuroscience.

[40]  Young T. Hong,et al.  Nucleus Accumbens D2/3 Receptors Predict Trait Impulsivity and Cocaine Reinforcement , 2007, Science.

[41]  P. Holland,et al.  The influence of CS-US interval on several different indices of learning in appetitive conditioning. , 2008, Journal of experimental psychology. Animal behavior processes.

[42]  Christian S. Jensen,et al.  Temporal Generalization , 2009, Encyclopedia of Database Systems.