论文信息 - Dopamine Bonuses

Dopamine Bonuses

Substantial data support a temporal difference (TO) model of dopamine (OA) neuron activity in which the cells provide a global error signal for reinforcement learning. However, in certain circumstances, DA activity seems anomalous under the TD model, responding to non-rewarding stimuli. We address these anomalies by suggesting that DA cells multiplex information about reward bonuses, including Sutton's exploration bonuses and Ng et al's non-distorting shaping bonuses. We interpret this additional role for DA in terms of the unconditional attentional and psychomotor effects of dopamine, having the computational role of guiding exploration.

Peter Dayan | Sham M. Kakade | S. Kakade | P. Dayan

[1] H E M. Journal of Neurophysiology , 1938, Nature.

[2] H. J. Gamble. Trends in Neurosciences , 1980 .

[3] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[4] Machine Learning, Proceedings of the Seventh International Conference on Machine Learning, Austin, Texas, USA, June 21-23, 1990 , 1990, ICML.

[5] Peter Dayan,et al. Exploration bonuses and dual control , 1996 .

[6] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[7] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[8] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[9] S. Gerhand. THE PREFRONTAL CORTEX—EXECUTIVE AND COGNITIVE FUNCTIONS. , 1999 .

[10] Achim G. Hoffmann,et al. Proceedings of the Nineteenth International Conference on Machine Learning , 2002 .