Understanding dopamine and reinforcement learning: The dopamine reward prediction error hypothesis

A number of recent advances have been achieved in the study of midbrain dopaminergic neurons. Understanding these advances and how they relate to one another requires a deep understanding of the computational models that serve as an explanatory framework and guide ongoing experimental inquiry. This intertwining of theory and experiment now suggests very clearly that the phasic activity of the midbrain dopamine neurons provides a global mechanism for synaptic modification. These synaptic modifications, in turn, provide the mechanistic underpinning for a specific class of reinforcement learning mechanisms that now seem to underlie much of human and animal behavior. This review describes both the critical empirical findings that are at the root of this conclusion and the fantastic theoretical advances from which this conclusion is drawn.

[1]  Ramón y Cajal,et al.  Histologie du système nerveux de l'homme & des vertébrés , 1909 .

[2]  I. Pavlov,et al.  Conditioned reflexes: An investigation of the physiological activity of the cerebral cortex. , 1929, Annals of neurosciences.

[3]  F. Attneave,et al.  The Organization of Behavior: A Neuropsychological Theory , 1949 .

[4]  R. R. Bush,et al.  A model for stimulus generalization and discrimination. , 1951, Psychological review.

[5]  R. R. Bush,et al.  A Mathematical Model for Simple Learning , 1951 .

[6]  K. Fuxe,et al.  EVIDENCE FOR THE EXISTENCE OF MONOAMINE-CONTAINING NEURONS IN THE CENTRAL NERVOUS SYSTEM. I. DEMONSTRATION OF MONOAMINES IN THE CELL BODIES OF BRAIN STEM NEURONS. , 1964, Acta physiologica Scandinavica. Supplementum.

[7]  K. Johansen REGIONAL DISTRIBUTION OF CIRCULATING BLOOD DURING SUBMERSION ASPHYXIA IN THE DUCK. , 1964, Acta physiologica Scandinavica.

[8]  R. Rescorla,et al.  A theory of Pavlovian conditioning : Variations in the effectiveness of reinforcement and nonreinforcement , 1972 .

[9]  T. Bliss,et al.  Long‐lasting potentiation of synaptic transmission in the dentate area of the anaesthetized rabbit following stimulation of the perforant path , 1973, The Journal of physiology.

[10]  A. Björklund,et al.  Mesencephalic dopamine neurons projecting to neocortex. , 1974, Brain research.

[11]  A G Barto,et al.  Toward a modern theory of adaptive networks: expectation and prediction. , 1981, Psychological review.

[12]  A. Tversky,et al.  The framing of decisions and the psychology of choice. , 1981, Science.

[13]  A. Grace,et al.  Intracellular and extracellular electrophysiology of nigral dopaminergic neurons—1. Identification and characterization , 1983, Neuroscience.

[14]  J. Fallon Topographic Organization of Ascending Dopaminergic Projections a , 1988, Annals of the New York Academy of Sciences.

[15]  A. Grace Phasic versus tonic dopamine release and the modulation of dopamine system responsivity: A hypothesis for the etiology of schizophrenia , 1991, Neuroscience.

[16]  J. Wickens A Theory of the Striatum , 1993 .

[17]  W. Schultz,et al.  Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task , 1993, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[18]  Peter Dayan,et al.  Bee foraging in uncertain environments using predictive hebbian learning , 1995, Nature.

[19]  J. Wickens,et al.  Cellular models of reinforcement. , 1995 .

[20]  P. Dayan,et al.  A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[21]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[22]  A. Graybiel,et al.  Neurochemical architecture of the human striatum , 1997, The Journal of comparative neurology.

[23]  P S Goldman-Rakic,et al.  Widespread origin of the primate mesofrontal dopamine system. , 1998, Cerebral cortex.

[24]  W. Schultz,et al.  Modifications of reward expectation-related neuronal activity during learning in primate orbitofrontal cortex. , 2000, Journal of neurophysiology.

[25]  Nikolaus R. McFarland,et al.  Striatonigrostriatal Pathways in Primates Form an Ascending Spiral from the Shell to the Dorsolateral Striatum , 2000, The Journal of Neuroscience.

[26]  Peter Dayan,et al.  Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems , 2001 .

[27]  A. Carlsson NOBEL LECTURE: A Half-Century of Neurotransmitter Research: Impact on Neurology and Psychiatry , 2001 .

[28]  Sham M. Kakade,et al.  Opponent interactions between serotonin and dopamine , 2002, Neural Networks.

[29]  P. Strick,et al.  Basal-ganglia 'projections' to the prefrontal cortex of the primate. , 2002, Cerebral cortex.

[30]  N. Lecture A Half-Century of Neurotransmitter Research: Impact on Neurology and Psychiatry , 2002 .

[31]  Carmen C Canavier,et al.  Electrical coupling between model midbrain dopamine neurons: effects on firing pattern and synchrony. , 2002, Journal of neurophysiology.

[32]  Samuel M. McClure,et al.  Temporal Prediction Errors in a Passive Learning Task Activate Human Striatum , 2003, Neuron.

[33]  Bruno A. Olshausen,et al.  Book Review , 2003, Journal of Cognitive Neuroscience.

[34]  Karl J. Friston,et al.  Temporal Difference Models and Reward-Related Learning in the Human Brain , 2003, Neuron.

[35]  W. Schultz,et al.  Discrete Coding of Reward Probability and Uncertainty by Dopamine Neurons , 2003, Science.

[36]  P. Strick,et al.  Cerebellar Loops with Motor Cortex and Prefrontal Cortex of a Nonhuman Primate , 2003, The Journal of Neuroscience.

[37]  P. Strick,et al.  Macro-architecture of basal ganglia loops with the cerebral cortex: use of rabies virus to reveal multisynaptic circuits. , 2004, Progress in brain research.

[38]  J. Glowinski,et al.  Electrical Synapses between Dopaminergic Neurons of the Substantia Nigra Pars Compacta , 2005, The Journal of Neuroscience.

[39]  K. Doya,et al.  Representation of Action-Specific Reward Values in the Striatum , 2005, Science.

[40]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[41]  P. Glimcher,et al.  Midbrain Dopamine Neurons Encode a Quantitative Reward Prediction Error Signal , 2005, Neuron.

[42]  P. Dayan,et al.  Dopamine, uncertainty and TD learning , 2005, Behavioral and Brain Functions.

[43]  P. Glimcher,et al.  JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR 2005, 84, 555–579 NUMBER 3(NOVEMBER) DYNAMIC RESPONSE-BY-RESPONSE MODELS OF MATCHING BEHAVIOR IN RHESUS MONKEYS , 2022 .

[44]  C. Padoa-Schioppa,et al.  Neurons in the orbitofrontal cortex encode economic value , 2006, Nature.

[45]  K. Berridge The debate over dopamine’s role in reward: the case for incentive salience , 2007, Psychopharmacology.

[46]  P. Redgrave,et al.  The short-latency dopamine signal: a role in discovering novel actions? , 2006, Nature Reviews Neuroscience.

[47]  Peter K. Schott,et al.  How Basic Are Behavioral Biases? Evidence from Capuchin Monkey Trading Behavior , 2006, Journal of Political Economy.

[48]  Elyssa B. Margolis,et al.  The ventral tegmental area revisited: is there an electrophysiological marker for dopaminergic neurons? , 2006, The Journal of physiology.

[49]  P. Dayan,et al.  Tonic dopamine: opportunity costs and the control of response vigor , 2007, Psychopharmacology.

[50]  P. Glimcher,et al.  Statistics of midbrain dopamine neuron spike trains in the awake primate. , 2007, Journal of neurophysiology.

[51]  S. Stich,et al.  The Innate Mind: Foundations and the Future , 2007 .

[52]  M. Roesch,et al.  Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards , 2007, Nature Neuroscience.

[53]  Thomas Wichmann,et al.  Circuits and circuit disorders of the basal ganglia. , 2007, Archives of neurology.

[54]  A. Tversky,et al.  Rational choice and the framing of decisions , 1990 .

[55]  P. Glimcher,et al.  Action and Outcome Encoding in the Primate Caudate Nucleus , 2007, The Journal of Neuroscience.

[56]  Andrew Caplin,et al.  The Neuroeconomic Theory of Learning , 2007 .

[57]  Laurie R Santos,et al.  Innate Constraints on Judgment and Decision‐Making? , 2008 .

[58]  P. Dayan,et al.  Reinforcement learning: The Good, The Bad and The Ugly , 2008, Current Opinion in Neurobiology.

[59]  P. Glimcher,et al.  Value Representations in the Primate Striatum during Matching Behavior , 2008, Neuron.

[60]  Colin Camerer,et al.  Neuroeconomics: decision making and the brain , 2008 .

[61]  John A. Dani,et al.  Controls of Tonic and Phasic Dopamine Transmission in the Dorsal and Ventral Striatum , 2009, Molecular Pharmacology.

[62]  Joshua L. Plotkin,et al.  Dopamine and synaptic plasticity in dorsal striatal circuits controlling action selection , 2009, Current Opinion in Neurobiology.

[63]  O. Hikosaka,et al.  Two types of dopamine neuron distinctly convey positive and negative motivational signals , 2009, Nature.

[64]  P. Montague,et al.  Theoretical and Empirical Studies of Learning , 2009 .

[65]  M. Roesch,et al.  A new perspective on the role of the orbitofrontal cortex in adaptive behaviour , 2009, Nature Reviews Neuroscience.

[66]  P. Glimcher,et al.  MEASURING BELIEFS AND REWARDS: A NEUROECONOMIC APPROACH. , 2010, The quarterly journal of economics.

[67]  P. Glimcher Foundations of Neuroeconomic Analysis , 2010 .

[68]  Mauro Dam,et al.  The discovery of central monoamine neurons gave volume transmission to the wired brain , 2010, Progress in Neurobiology.

[69]  Ethan S. Bromberg-Martin,et al.  Distinct Tonic and Phasic Anticipatory Activity in Lateral Habenula and Dopamine Neurons , 2010, Neuron.

[70]  A. Bonci,et al.  Effects of stress and aversion on dopamine neurons: Implications for addiction , 2010, Neuroscience & Biobehavioral Reviews.