Theory meets pigeons: The influence of reward-magnitude on discrimination-learning

Modern theoretical accounts on reward-based learning are commonly based on reinforcement learning algorithms. Most noted in this context is the temporal-difference (TD) algorithm in which the difference between predicted and obtained reward, the prediction-error, serves as a learning signal. Consequently, larger rewards cause bigger prediction-errors and lead to faster learning than smaller rewards. Therefore, if animals employ a neural implementation of TD learning, reward-magnitude should affect learning in animals accordingly. Here we test this prediction by training pigeons on a simple color-discrimination task with two pairs of colors. In each pair, correct discrimination is rewarded; in pair one with a large-reward, in pair two with a small-reward. Pigeons acquired the 'large-reward' discrimination faster than the 'small-reward' discrimination. Animal behavior and an implementation of the TD-algorithm yielded comparable results with respect to the difference between learning curves in the large-reward and in the small-reward conditions. We conclude that the influence of reward-magnitude on the acquisition of a simple discrimination paradigm is accurately reflected by a TD implementation of reinforcement learning.

[1]  W. Schultz Getting Formal with Dopamine and Reward , 2002, Neuron.

[2]  K. Doya,et al.  Representation of Action-Specific Reward Values in the Striatum , 2005, Science.

[3]  E. Miller,et al.  Different time courses of learning-related activity in the prefrontal cortex and striatum , 2005, Nature.

[4]  P. Glimcher,et al.  Midbrain Dopamine Neurons Encode a Quantitative Reward Prediction Error Signal , 2005, Neuron.

[5]  A J Neuringer,et al.  Effects of reinforcement magnitude on choice and rate of responding. , 1967, Journal of the experimental analysis of behavior.

[6]  P. Dayan,et al.  A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[7]  W. Schultz Multiple reward signals in the brain , 2000, Nature Reviews Neuroscience.

[8]  Tobias Otto,et al.  The Biopsychology-Toolbox: A free, open-source Matlab-toolbox for the control of behavioral experiments , 2008, Journal of Neuroscience Methods.

[9]  L. Crespi Quantitative variation of incentive and performance in the white rat. , 1942 .

[10]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[11]  A G Barto,et al.  Toward a modern theory of adaptive networks: expectation and prediction. , 1981, Psychological review.

[12]  J. Horvitz,et al.  Dopaminergic Mechanisms in Actions and Habits , 2007, The Journal of Neuroscience.

[13]  H. W. Nissen,et al.  The Influence of Amount of Incentive on Delayed Response Performances of Chimpanzees , 1935 .

[14]  K. M. Michels Response latency as a function of the amount of reinforcement , 1957 .

[15]  W. Schultz Multiple dopamine functions at different time courses. , 2007, Annual review of neuroscience.

[16]  J. Wickens,et al.  A cellular mechanism of reward-related learning , 2001, Nature.

[17]  Reversal learning as a function of the size of the reward during acquisition and reversal. , 1967, Journal of experimental psychology.

[18]  N. Guttman Equal-reinforcement values for sucrose and glucose solutions compared with equal-sweetness values. , 1954, Journal of comparative and physiological psychology.

[19]  R. Black Shifts in magnitude of reward and contrast effects in instrumental and selective learning: a reinterpretation. , 1968, Psychological review.

[20]  D. Wickens,et al.  Effect of differential quantity of reward on acquisition and performance of a maze habit. , 1954, Journal of comparative and physiological psychology.

[21]  K. Doya Modulators of decision making , 2008, Nature Neuroscience.

[22]  A. Graybiel The basal ganglia: learning new tricks and loving it , 2005, Current Opinion in Neurobiology.

[23]  N. Guttman,et al.  Operant conditioning, extinction, and periodic reinforcement in relation to concentration of sucrose used as reinforcing agent. , 1953, Journal of experimental psychology.

[24]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[25]  M. Denny,et al.  Differential response learning on the basis of differential size of reward. , 1955, The Journal of Genetic Psychology.

[26]  W. Pan,et al.  Dopamine Cells Respond to Predicted Events during Classical Conditioning: Evidence for Eligibility Traces in the Reward-Learning Network , 2005, The Journal of Neuroscience.

[27]  W. Pan,et al.  Tripartite Mechanism of Extinction Suggested by Dopamine Neuron Activity and Temporal Difference Model , 2008, The Journal of Neuroscience.

[28]  T. Kalenscher,et al.  Single Units in the Pigeon Brain Integrate Reward Amount and Time-to-Reward in an Impulsive Choice Task , 2005, Current Biology.

[29]  W. Schultz Behavioral dopamine signals , 2007, Trends in Neurosciences.

[30]  S. Grillner,et al.  Forebrain dopamine depletion impairs motor behavior in lamprey , 2008, The European journal of neuroscience.

[31]  L. Crespi Amount of reinforcement and level of performance. , 1944 .

[32]  O. Güntürkün Avian and mammalian “prefrontal cortices”: Limited degrees of freedom in the evolution of the neural mechanisms of goal-state maintenance , 2005, Brain Research Bulletin.

[33]  T. Robbins,et al.  Dopamine Release in the Dorsal Striatum during Cocaine-Seeking Behavior under the Control of a Drug-Associated Cue , 2002, The Journal of Neuroscience.

[34]  P. J. Hutt Rate of bar pressing as a function of quality and quantity of food reward. , 1954, Journal of comparative and physiological psychology.

[35]  W. Schultz,et al.  Dopamine responses comply with basic assumptions of formal learning theory , 2001, Nature.

[36]  K. Berridge The debate over dopamine’s role in reward: the case for incentive salience , 2007, Psychopharmacology.

[37]  G. Kimble,et al.  Changes in response strength with changes in the amount of reinforcement. , 1956, Journal of experimental psychology.

[38]  E. Vaadia,et al.  Coincident but Distinct Messages of Midbrain Dopamine and Striatal Tonically Active Neurons , 2004, Neuron.

[39]  G. Collier,et al.  Changes in performance as a function of shifts in the magnitude of reinforcement. , 1959, Journal of experimental psychology.

[40]  W. O. Jenkins,et al.  Rate of responding and amount of reinforcement. , 1949, Journal of comparative and physiological psychology.

[41]  W. Schultz,et al.  Adaptive Coding of Reward Value by Dopamine Neurons , 2005, Science.

[42]  T. Robbins,et al.  Neural systems of reinforcement for drug addiction: from actions to habits to compulsion , 2005, Nature Neuroscience.

[43]  Ziv M. Williams,et al.  Selective enhancement of associative learning by microstimulation of the anterior caudate , 2006, Nature Neuroscience.