Dopamine neurons learn relative chosen value from probabilistic rewards

Economic theories posit reward probability as one of the factors defining reward value. Individuals learn the value of cues that predict probabilistic rewards from experienced reward frequencies. Building on the notion that responses of dopamine neurons increase with reward probability and expected value, we asked how dopamine neurons in monkeys acquire this value signal that may represent an economic decision variable. We found in a Pavlovian learning task that reward probability-dependent value signals arose from experienced reward frequencies. We then assessed neuronal response acquisition during choices among probabilistic rewards. Here, dopamine responses became sensitive to the value of both chosen and unchosen options. Both experiments showed also the novelty responses of dopamine neurones that decreased as learning advanced. These results show that dopamine neurons acquire predictive value signals from the frequency of experienced rewards. This flexible and fast signal reflects a specific decision variable and could update neuronal decision mechanisms. DOI: http://dx.doi.org/10.7554/eLife.18044.001

[1]  H. Harlow,et al.  The formation of learning sets. , 1949, Psychological review.

[2]  R. Rescorla,et al.  A theory of Pavlovian conditioning : Variations in the effectiveness of reinforcement and nonreinforcement , 1972 .

[3]  W. F. Prokasy,et al.  Classical conditioning II: Current research and theory. , 1972 .

[4]  J. Pearce,et al.  A model for Pavlovian learning: variations in the effectiveness of conditioned but not of unconditioned stimuli. , 1980, Psychological review.

[5]  W. Schultz,et al.  Responses of monkey dopamine neurons during learning of behavioral reactions. , 1992, Journal of neurophysiology.

[6]  S. N. Haber,et al.  The organization of midbrain projections to the ventral striatum in the primate , 1994, Neuroscience.

[7]  W. Schultz,et al.  Importance of unpredictability for reward responses in primate dopamine neurons. , 1994, Journal of neurophysiology.

[8]  J. Horvitz,et al.  Burst activity of ventral tegmental dopamine neurons is elicited by sensory stimuli in the awake cat , 1997, Brain Research.

[9]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[10]  J. Hollerman,et al.  Dopamine neurons report an error in the temporal prediction of reward during learning , 1998, Nature Neuroscience.

[11]  P S Goldman-Rakic,et al.  Widespread origin of the primate mesofrontal dopamine system. , 1998, Cerebral cortex.

[12]  J. Wickens,et al.  A cellular mechanism of reward-related learning , 2001, Nature.

[13]  Peter Dayan,et al.  Dopamine: generalization and bonuses , 2002, Neural Networks.

[14]  W. Schultz,et al.  Discrete Coding of Reward Probability and Uncertainty by Dopamine Neurons , 2003, Science.

[15]  M. Pelley The Role of Associative History in Models of Associative Learning: A Selective Review and a Hybrid Model: , 2004 .

[16]  M. L. Le Pelley The Role of Associative History in Models of Associative Learning: A Selective Review and a Hybrid Model , 2004, The Quarterly journal of experimental psychology. B, Comparative and physiological psychology.

[17]  W. Pan,et al.  Dopamine Cells Respond to Predicted Events during Classical Conditioning: Evidence for Eligibility Traces in the Reward-Learning Network , 2005, The Journal of Neuroscience.

[18]  W. Schultz,et al.  Adaptive Coding of Reward Value by Dopamine Neurons , 2005, Science.

[19]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[20]  P. Glimcher,et al.  Midbrain Dopamine Neurons Encode a Quantitative Reward Prediction Error Signal , 2005, Neuron.

[21]  E. Vaadia,et al.  Midbrain dopamine neurons encode decisions for future action , 2006, Nature Neuroscience.

[22]  M. Roesch,et al.  Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards , 2007, Nature Neuroscience.

[23]  P. Greengard,et al.  Dichotomous Dopaminergic Control of Striatal Synaptic Plasticity , 2008, Science.

[24]  R. Palmiter,et al.  Disruption of NMDAR-dependent burst firing by dopamine neurons provides selective assessment of phasic dopamine-dependent behavior , 2009, Proceedings of the National Academy of Sciences.

[25]  Takeo Watanabe,et al.  Temporally Extended Dopamine Responses to Perceptually Demanding Reward-Predictive Stimuli , 2010, The Journal of Neuroscience.

[26]  Ethan S. Bromberg-Martin,et al.  Dopamine in Motivational Control: Rewarding, Aversive, and Alerting , 2010, Neuron.

[27]  Robert C. Wilson,et al.  An Approximately Bayesian Delta-Rule Model Explains the Dynamics of Belief Updating in a Changing Environment , 2010, The Journal of Neuroscience.

[28]  榎本 一紀 Dopamine neurons learn to encode the long-term value of multiple future rewards , 2011 .

[29]  C. Padoa-Schioppa Neurobiology of economic choice: a good-based model. , 2011, Annual review of neuroscience.

[30]  A. Cooper,et al.  Predictive Reward Signal of Dopamine Neurons , 2011 .

[31]  Anne E Carpenter,et al.  Neuron-type specific signals for reward and punishment in the ventral tegmental area , 2011, Nature.

[32]  E. Miller,et al.  The Role of Prefrontal Dopamine D1 Receptors in the Neural Mechanisms of Associative Learning , 2012, Neuron.

[33]  Josiah R. Boivin,et al.  A Causal Link Between Prediction Errors, Dopamine Neurons and Learning , 2013, Nature Neuroscience.

[34]  William R. Stauffer,et al.  Dopamine Reward Prediction Error Responses Reflect Marginal Utility , 2014, Current Biology.

[35]  Vincent D Costa,et al.  Dopamine modulates novelty seeking behavior during decision making. , 2014, Behavioral neuroscience.

[36]  Raag D. Airan,et al.  Natural Neural Projection Dynamics Underlying Social Behavior , 2014, Cell.

[37]  William R. Stauffer,et al.  Dopamine prediction error responses integrate subjective value from different reward dimensions , 2014, Proceedings of the National Academy of Sciences.

[38]  Ilana B. Witten,et al.  Mesolimbic Dopamine Dynamically Tracks, and Is Causally Linked to, Discrete Aspects of Value-Based Decision Making , 2015, Biological Psychiatry.

[39]  Wolfram Schultz,et al.  Scaling prediction errors to reward variability benefits error-driven learning in humans , 2015, Journal of neurophysiology.

[40]  P. Phillips,et al.  Subsecond dopamine fluctuations in human striatum encode superposed error signals about actual and counterfactual reward , 2015, Proceedings of the National Academy of Sciences.

[41]  William R. Stauffer,et al.  Components and characteristics of the dopamine reward utility signal , 2016, The Journal of comparative neurology.

[42]  Frédéric Gilet Pensees Pensées , 2016, Zenétudes 2 : vivre sainement la transition au collège – Cahier du participant Quand les blues m'envahissent.

[43]  Wolfram Schultz,et al.  Dopamine reward prediction-error signalling: a two-component response , 2016, Nature Reviews Neuroscience.

[44]  Vaughn L. Hetrick,et al.  Mesolimbic Dopamine Signals the Value of Work , 2015, Nature Neuroscience.