论文信息 - Dopamine Ramps Are a Consequence of Reward Prediction Errors

Dopamine Ramps Are a Consequence of Reward Prediction Errors

Temporal difference learning models of dopamine assert that phasic levels of dopamine encode a reward prediction error. However, this hypothesis has been challenged by recent observations of gradually ramping stratal dopamine levels as a goal is approached. This note describes conditions under which temporal difference learning models predict dopamine ramping. The key idea is representational: a quadratic transformation of proximity to the goal implies approximately linear ramping, as observed experimentally.

Samuel Gershman | S. Gershman

[1] Ken Cheng,et al. Some psychophysics of the pigeon's use of landmarks , 1988, Journal of Comparative Physiology A.

[2] Richard S. Sutton,et al. Stimulus Representation and the Timing of Reward-Prediction Errors in Models of the Dopamine System , 2008, Neural Computation.

[3] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.

[4] Zeb Kurth-Nelson,et al. Temporal-Difference Reinforcement Learning with Distributed Representations , 2009, PloS one.

[5] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[6] David J. Foster,et al. A model of hippocampally dependent navigation, using the temporal difference learning rule , 2000, Hippocampus.

[7] J. O’Keefe,et al. Geometric determinants of the place fields of hippocampal neurons , 1996, Nature.

[8] Y. Niv. Neuroscience: Dopamine ramps up , 2013, Nature.

[9] P. Glimcher. Understanding dopamine and reinforcement learning: The dopamine reward prediction error hypothesis , 2011, Proceedings of the National Academy of Sciences.

[10] Ken Cheng,et al. More psychophysics of the pigeon's use of landmarks , 2004, Journal of Comparative Physiology A.

[11] A. Graybiel,et al. Prolonged Dopamine Signalling in Striatum Signals Proximity and Value of Distant Rewards , 2013, Nature.

[12] Y. Niv,et al. Dialogues on prediction errors , 2008, Trends in Cognitive Sciences.

[13] David S. Touretzky,et al. Representation and Timing in Theories of the Dopamine System , 2006, Neural Computation.

[14] S. W. Zhang,et al. Error is proportional to distance measured by honeybees: Weber’s law in the odometer , 1999, Animal Cognition.

[15] Vaughn L. Hetrick,et al. Mesolimbic Dopamine Signals the Value of Work , 2015, Nature Neuroscience.

[16] Andrew G. Barto,et al. Linear Least-Squares Algorithms for Temporal Difference Learning , 2005, Machine Learning.

[17] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[18] Anne G E Collins,et al. Surprise! Dopamine signals mix action, value and error , 2015, Nature Neuroscience.

[19] Kenji Morita,et al. Striatal dopamine ramping may indicate flexible reinforcement learning with forgetting in the cortico-basal ganglia circuits , 2014, Front. Neural Circuits.

[20] Peter Dayan,et al. A Neural Substrate of Prediction and Reward , 1997, Science.

[21] Nathaniel D. Daw,et al. Grid Cells, Place Cells, and Geodesic Generalization for Spatial Reinforcement Learning , 2011, PLoS Comput. Biol..