Causal evidence supporting the proposal that dopamine transients function as temporal difference prediction errors

Reward-evoked dopamine is well-established as a prediction error. However the central tenet of temporal difference accounts – that similar transients evoked by reward-predictive cues also function as errors – remains untested. To address this, we used two phenomena, second-order conditioning and blocking, in order to examine the role of dopamine in prediction error versus reward prediction. We show that optogenetically-shunting dopamine activity at the start of a reward-predicting cue prevents second-order conditioning without affecting blocking. These results support temporal difference accounts by providing causal evidence that cue-evoked dopamine transients function as prediction errors.

[1]  James S. Nairne,et al.  Second-order conditioning with diffuse auditory reinforcers in the pigeon☆ , 1981 .

[2]  Peter Dayan,et al.  Improving Generalization for Temporal Difference Learning: The Successor Representation , 1993, Neural Computation.

[3]  S. Killcross,et al.  The prelimbic cortex contributes to the down-regulation of attention toward redundant cues. , 2014, Cerebral cortex.

[4]  Matthew P. H. Gardner,et al.  Brief, But Not Prolonged, Pauses in the Firing of Midbrain Dopamine Neurons Are Sufficient to Produce a Conditioned Inhibitor , 2018, The Journal of Neuroscience.

[5]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[6]  Marshall R. Jones Miami Symposium on the prediction of behavior, 1967 : aversive stimulation , 1968 .

[7]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[8]  R. Rescorla,et al.  Associations in second-order conditioning and sensory preconditioning. , 1972, Journal of comparative and physiological psychology.

[9]  J. Berke What does dopamine mean? , 2018, Nature Neuroscience.

[10]  P. Janak,et al.  Ventral Tegmental Dopamine Neurons Participate in Reward Identity Predictions , 2019, Current Biology.

[11]  Matthew P. H. Gardner,et al.  Optogenetic Blockade of Dopamine Transients Prevents Learning Induced by Changes in Reward Features , 2017, Current Biology.

[12]  P. Glimcher Understanding dopamine and reinforcement learning: The dopamine reward prediction error hypothesis , 2011, Proceedings of the National Academy of Sciences.

[13]  Geoffrey Schoenbaum,et al.  Rethinking dopamine as generalized prediction error , 2018, bioRxiv.