Dopamine transients do not act as model-free prediction errors during associative learning

Dopamine neurons are proposed to signal the reward prediction error in model-free reinforcement learning algorithms. This term represents the unpredicted or ‘excess’ value of the rewarding event, value that is then added to the intrinsic value of any antecedent cues, contexts or events. To support this proposal, proponents cite evidence that artificially-induced dopamine transients cause lasting changes in behavior. Yet these studies do not generally assess learning under conditions where an endogenous prediction error would occur. Here, to address this, we conducted three experiments where we optogenetically activated dopamine neurons while rats were learning associative relationships, both with and without reward. In each experiment, the antecedent cues failed to acquire value and instead entered into associations with the later events, whether valueless cues or valued rewards. These results show that in learning situations appropriate for the appearance of a prediction error, dopamine transients support associative, rather than model-free, learning.Dopamine neurons are proposed to signal the reward prediction error in model-free reinforcement learning algorithms. Here, the authors show that when given during an associative learning task, optogenetic activation of dopamine neurons causes associative, rather than value, learning.

[1]  Hannah M. Batchelor,et al.  Preconditioned cues have no value , 2017, eLife.

[2]  W. Schultz,et al.  Importance of unpredictability for reward responses in primate dopamine neurons. , 1994, Journal of neurophysiology.

[3]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[4]  K. Deisseroth,et al.  Optogenetic Interrogation of Dopaminergic Modulation of the Multiple Phases of Reward-Seeking Behavior , 2011, The Journal of Neuroscience.

[5]  Satoshi Ikemoto,et al.  Disrupting Glutamate Co-transmission Does Not Affect Acquisition of Conditioned Behavior Reinforced by Dopamine Neuron Activation. , 2017, Cell reports.

[6]  T. Robbins,et al.  Neural systems of reinforcement for drug addiction: from actions to habits to compulsion , 2005, Nature Neuroscience.

[7]  K. Deisseroth,et al.  Phasic Firing in Dopaminergic Neurons Is Sufficient for Behavioral Conditioning , 2009, Science.

[8]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[9]  R. Wightman,et al.  Subsecond dopamine release promotes cocaine seeking , 2003, Nature.

[10]  B. Everitt,et al.  Emotion and motivation: the role of the amygdala, ventral striatum, and prefrontal cortex , 2002, Neuroscience & Biobehavioral Reviews.

[11]  Wolfram Schultz,et al.  Dopamine reward prediction-error signalling: a two-component response , 2016, Nature Reviews Neuroscience.

[12]  S. Kapur Psychosis as a state of aberrant salience: a framework linking biology, phenomenology, and pharmacology in schizophrenia. , 2003, The American journal of psychiatry.

[13]  Matthew P. H. Gardner,et al.  Optogenetic Blockade of Dopamine Transients Prevents Learning Induced by Changes in Reward Features , 2017, Current Biology.

[14]  A G Barto,et al.  Toward a modern theory of adaptive networks: expectation and prediction. , 1981, Psychological review.

[15]  A. Dickinson,et al.  Disrupted prediction-error signal in psychosis: evidence for an associative account of delusions. , 2007, Brain : a journal of neurology.

[16]  Joshua L. Jones,et al.  Dopamine transients are sufficient and necessary for acquisition of model-based associations , 2017, Nature Neuroscience.

[17]  M. L. Le Pelley,et al.  Overt attention and predictiveness in human contingency learning. , 2011, Journal of experimental psychology. Animal behavior processes.

[18]  P. Shizgal,et al.  The Effects of Electrical and Optical Stimulation of Midbrain Dopaminergic Neurons on Rat 50-kHz Ultrasonic Vocalizations , 2015, Front. Behav. Neurosci..

[19]  P. Shizgal,et al.  The reinforcement mountain: allocation of behavior as a function of the rate and intensity of rewarding brain stimulation. , 2008, Behavioral neuroscience.

[20]  R. Wightman,et al.  Coordinated Accumbal Dopamine Release and Neural Activity Drive Goal-Directed Behavior , 2007, Neuron.

[21]  Joshua L. Jones,et al.  Orbitofrontal Cortex Supports Behavior and Learning Using Inferred But Not Cached Values , 2012, Science.

[22]  B. Everitt,et al.  Acquisition of Instrumental Conditioned Reinforcement is Resistant to the Devaluation of the Unconditioned Stimulus , 2005, The Quarterly journal of experimental psychology. B, Comparative and physiological psychology.

[23]  Vaughn L. Hetrick,et al.  Mesolimbic Dopamine Signals the Value of Work , 2015, Nature Neuroscience.

[24]  Guillem R. Esber,et al.  Brief optogenetic inhibition of dopamine neurons mimics endogenous negative reward prediction errors , 2015, Nature Neuroscience.

[25]  Benjamin T. Saunders,et al.  Dopamine neurons create Pavlovian conditioned stimuli with circuit-defined motivational properties , 2018, Nature Neuroscience.

[26]  Ilana B. Witten,et al.  Recombinase-Driver Rat Lines: Tools, Techniques, and Optogenetic Application to Dopamine-Mediated Reinforcement , 2011, Neuron.

[27]  Geoffrey Schoenbaum,et al.  The role of the orbitofrontal cortex in the pursuit of happiness and more specific rewards , 2008, Nature.

[28]  P. Holland,et al.  Savings test for associations between neutral stimuli , 1983 .

[29]  Geoffrey Schoenbaum,et al.  Midbrain dopamine neurons compute inferred and cached value prediction errors in a common framework , 2016, eLife.

[30]  P. Holland Second-order conditioning with and without unconditioned stimulus presentation. , 1980, Journal of experimental psychology. Animal behavior processes.

[31]  N. Mackintosh A Theory of Attention: Variations in the Associability of Stimuli with Reinforcement , 1975 .

[32]  W. Brogden,et al.  Effect of amount of preconditioning training upon the magnitude of sensory preconditioning. , 1960, Journal of Experimental Psychology.

[33]  M. Poo,et al.  Phasic dopamine release in the medial prefrontal cortex enhances stimulus discrimination , 2016, Proceedings of the National Academy of Sciences.

[34]  P. Janak,et al.  Ventral Tegmental Dopamine Neurons Participate in Reward Identity Predictions , 2019, Current Biology.

[35]  P. Holland,et al.  Spontaneous configuring in conditioned flavor aversion. , 1985, Journal of experimental psychology. Animal behavior processes.

[36]  W. Brogden Sensory pre-conditioning. , 1939 .