Neurocomputational mechanisms of reinforcement-guided learning in humans: A review

Adapting decision making according to dynamic and probabilistic changes in action-reward contingencies is critical for survival in a competitive and resource-limited world. Much research has focused on elucidating the neural systems and computations that underlie how the brain identifies whether the consequences of actions are relatively good or bad. In contrast, less empirical research has focused on the mechanisms by which reinforcements might be used to guide decision making. Here, I review recent studies in which an attempt to bridge this gap has been made by characterizing how humans use reward information to guide and optimize decision making. Regions that have been implicated in reinforcement processing, including the striatum, orbitofrontal cortex, and anterior cingulate, also seem to mediate how reinforcements are used to adjust subsequent decision making. This research provides insights into why the brain devotes resources to evaluating reinforcements and suggests a direction for future research, from studying the mechanisms of reinforcement processing to studying the mechanisms of reinforcement learning.

[1]  W. Brown Animal Intelligence: Experimental Studies , 1912, Nature.

[2]  P. Groves,et al.  Burst firing induced in midbrain dopamine neurons by stimulation of the medial prefrontal and anterior cingulate cortices , 1988, Brain Research.

[3]  Richard S. Sutton,et al.  Time-Derivative Models of Pavlovian Reinforcement , 1990 .

[4]  D Servan-Schreiber,et al.  A theory of dopamine function and its role in cognitive deficits in schizophrenia. , 1993, Schizophrenia bulletin.

[5]  D. Meyer,et al.  A Neural System for Error Detection and Compensation , 1993 .

[6]  Michael A. Arbib,et al.  The handbook of brain theory and neural networks , 1995, A Bradford book.

[7]  Jeremiah Y. Cohen,et al.  The neural basis of saccade target selection , 1995 .

[8]  R. Palmiter,et al.  Dopamine-deficient mice are severely hypoactive, adipsic, and aphagic , 1995, Cell.

[9]  S P Wise,et al.  Distributed modular architectures linking basal ganglia, cerebellum, and cerebral cortex: their role in planning and controlling action. , 1995, Cerebral cortex.

[10]  C. Pennartz The ascending neuromodulatory systems in learning by reinforcement: comparing computational conjectures with experimental findings , 1995, Brain Research Reviews.

[11]  P. Dayan,et al.  A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[12]  P. Overton,et al.  Stimulation of the prefrontal cortex in the rat induces patterns of activity in midbrain dopaminergic neurons which resemble natural burst events , 1996, Synapse.

[13]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[14]  J. Hollerman,et al.  Dopamine neurons report an error in the temporal prediction of reward during learning , 1998, Nature Neuroscience.

[15]  P. Montague,et al.  A Computational Role for Dopamine Delivery in Human Decision-Making , 1998, Journal of Cognitive Neuroscience.

[16]  C. Cepeda,et al.  Dopamine and N-Methyl-D- Aspartate Receptor Interactions in the Neostriatum , 1998, Developmental Neuroscience.

[17]  P S Goldman-Rakic,et al.  Widespread origin of the primate mesofrontal dopamine system. , 1998, Cerebral cortex.

[18]  F. Weiss,et al.  The dopamine hypothesis of reward: past and current status , 1999, Trends in Neurosciences.

[19]  W. Schultz,et al.  A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task , 1999, Neuroscience.

[20]  P. Redgrave,et al.  Is the short-latency dopamine response too short to signal reward error? , 1999, Trends in Neurosciences.

[21]  Michael L. Platt,et al.  Neural correlates of decision variables in parietal cortex , 1999, Nature.

[22]  G. Koob The Role of the Striatopallidal and Extended Amygdala Systems in Drug Addiction , 1999, Annals of the New York Academy of Sciences.

[23]  T. Robbins,et al.  Associative Processes in Addiction and Reward The Role of Amygdala‐Ventral Striatal Subsystems , 1999, Annals of the New York Academy of Sciences.

[24]  J. Changeux,et al.  Reward-dependent learning in neuronal networks for planning and decision making. , 2000, Progress in brain research.

[25]  A. Dickinson,et al.  Neuronal coding of prediction errors. , 2000, Annual review of neuroscience.

[26]  Thomas J. H. Chen,et al.  The Reward Deficiency Syndrome: A Biogenetic Model for the Diagnosis and Treatment of Impulsive, Addictive and Compulsive Behaviors , 2000, Journal of psychoactive drugs.

[27]  J. Gold,et al.  Representation of a perceptual decision in developing oculomotor commands , 2000, Nature.

[28]  S. Sesack,et al.  Projections from the Rat Prefrontal Cortex to the Ventral Tegmental Area: Target Specificity in the Synaptic Associations with Mesoaccumbens and Mesocortical Neurons , 2000, The Journal of Neuroscience.

[29]  J. Yesavage,et al.  Context processing in older adults: evidence for a theory relating cognitive control to neurobiology in healthy aging. , 2001, Journal of experimental psychology. General.

[30]  W. Schultz Book Review: Reward Signaling by Dopamine Neurons , 2001, The Neuroscientist : a review journal bringing neurobiology, neurology and psychiatry.

[31]  R. Ratcliff A diffusion model account of response time and accuracy in a brightness discrimination task: Fitting real data and failing to fit fake but plausible data , 2002, Psychonomic bulletin & review.

[32]  P. Dayan,et al.  Reward, Motivation, and Reinforcement Learning , 2002, Neuron.

[33]  J. Gold,et al.  Banburismus and the Brain Decoding the Relationship between Sensory Stimuli, Decisions, and Reward , 2002, Neuron.

[34]  Clay B. Holroyd,et al.  The neural basis of human error processing: reinforcement learning, dopamine, and the error-related negativity. , 2002, Psychological review.

[35]  Eytan Ruppin,et al.  Actor-critic models of the basal ganglia: new anatomical and computational perspectives , 2002, Neural Networks.

[36]  Markus Kiefer,et al.  Human anterior cingulate cortex is activated by negative feedback: evidence from event-related potentials in a guessing task , 2002, Neuroscience Letters.

[37]  Daniel C. Krawczyk,et al.  Contributions of the prefrontal cortex to the neural basis of human decision making , 2002, Neuroscience & Biobehavioral Reviews.

[38]  P. Montague,et al.  Neural Economics and the Biological Substrates of Valuation , 2002, Neuron.

[39]  K. R. Ridderinkhof,et al.  A computational account of altered error processing in older age: Dopamine and the error-related negativity , 2002, Cognitive, affective & behavioral neuroscience.

[40]  T. Robbins,et al.  Defining the Neural Mechanisms of Probabilistic Reversal Learning Using Event-Related Functional Magnetic Resonance Imaging , 2002, The Journal of Neuroscience.

[41]  H. Garavan,et al.  Dissociable Executive Functions in the Dynamic Control of Behavior: Inhibition, Error Detection, and Correction , 2002, NeuroImage.

[42]  Roland E. Suri,et al.  TD models of reward predictive responses in dopamine neurons , 2002, Neural Networks.

[43]  Colin Camerer Behavioral Game Theory: Experiments in Strategic Interaction , 2003 .

[44]  Joshua W. Brown,et al.  Principles of Pleasure Prediction Specifying the Neural Dynamics of Human Reward Learning , 2003, Neuron.

[45]  Corey J Bohil,et al.  Linear transformations of the payoff matrix and decision criterion learning in perceptual categorization. , 2003, Journal of experimental psychology. Learning, memory, and cognition.

[46]  Samuel M. McClure,et al.  Temporal Prediction Errors in a Passive Learning Task Activate Human Striatum , 2003, Neuron.

[47]  Karl J. Friston,et al.  Temporal Difference Models and Reward-Related Learning in the Human Brain , 2003, Neuron.

[48]  K. R. Ridderinkhof,et al.  Errors are foreshadowed in brain potentials associated with action monitoring in cingulate cortex in humans , 2003, Neuroscience Letters.

[49]  J. O'Doherty,et al.  Dissociating Valence of Outcome from Behavioral Control in Human Orbital and Ventral Prefrontal Cortices , 2003, The Journal of Neuroscience.

[50]  Jeffrey R Tenney,et al.  Neural Substrates Underlying Impulsivity , 2003, Annals of the New York Academy of Sciences.

[51]  Clay B. Holroyd,et al.  Errors in reward prediction are re£ected in the event-related brain potential , 2003 .

[52]  M. Gluck,et al.  Human midbrain sensitivity to cognitive feedback and uncertainty during classification learning. , 2004, Journal of neurophysiology.

[53]  Saori C. Tanaka,et al.  Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops , 2004, Nature Neuroscience.

[54]  Jonathan D. Cohen,et al.  Computational roles for dopamine in behavioural control , 2004, Nature.

[55]  O. Hikosaka,et al.  Dopamine Neurons Can Represent Context-Dependent Prediction Error , 2004, Neuron.

[56]  T. Robbins,et al.  Increased response switching, perseveration and perseverative switching following d-amphetamine in the rat , 2004, Psychopharmacology.

[57]  Clay B. Holroyd,et al.  Dorsal anterior cingulate cortex shows fMRI response to internal and external error signals , 2004, Nature Neuroscience.

[58]  W. Newsome,et al.  Matching Behavior and the Representation of Value in the Parietal Cortex , 2004, Science.

[59]  M. Walton,et al.  Action sets and decisions in the medial frontal cortex , 2004, Trends in Cognitive Sciences.

[60]  Karl J. Friston,et al.  Dissociable Roles of Ventral and Dorsal Striatum in Instrumental Conditioning , 2004, Science.

[61]  Clay B. Holroyd,et al.  Reinforcement-related brain potentials from medial frontal cortex: origins and functional significance , 2004, Neuroscience & Biobehavioral Reviews.

[62]  E. Rolls,et al.  The functional neuroanatomy of the human orbitofrontal cortex: evidence from neuroimaging and neuropsychology , 2004, Progress in Neurobiology.

[63]  M. Ungless Dopamine: the salient issue , 2004, Trends in Neurosciences.

[64]  Michael J. Frank,et al.  By Carrot or by Stick: Cognitive Reinforcement Learning in Parkinsonism , 2004, Science.

[65]  Tomifusa Kuboki,et al.  Error-related negativity reflects detection of negative reward prediction error , 2004, Neuroreport.

[66]  T. Ljungberg,et al.  Disruptive effects of low doses of d-amphetamine on the ability of rats to organize behaviour into functional sequences , 2004, Psychopharmacology.

[67]  D. Barraclough,et al.  Prefrontal cortex and decision making in a mixed-strategy game , 2004, Nature Neuroscience.

[68]  K. Doya,et al.  A Neural Correlate of Reward-Based Behavioral Learning in Caudate Nucleus: A Functional Magnetic Resonance Imaging Study of a Stochastic Decision Task , 2004, The Journal of Neuroscience.

[69]  R. E. Passingham,et al.  Prediction error for free monetary reward in the human prefrontal cortex , 2004, NeuroImage.

[70]  D. V. von Cramon,et al.  Neural correlates of error detection and error correction: is there a common neuroanatomical substrate? , 2004, The European journal of neuroscience.

[71]  Peter Dayan,et al.  Temporal difference models describe higher-order learning in humans , 2004, Nature.

[72]  A. Sanfey,et al.  Independent Coding of Reward Magnitude and Valence in the Human Brain , 2004, The Journal of Neuroscience.

[73]  A. Grace,et al.  The Catechol-O-Methyltransferase Polymorphism: Relations to the Tonic–Phasic Dopamine Hypothesis and Neuropsychiatric Phenotypes , 2004, Neuropsychopharmacology.

[74]  A. Engel,et al.  Trial-by-Trial Coupling of Concurrent Electroencephalogram and Functional Magnetic Resonance Imaging Identifies the Dynamics of Performance Monitoring , 2005, The Journal of Neuroscience.

[75]  Clay B. Holroyd,et al.  Brain potentials associated with expected and unexpected good and bad outcomes. , 2005, Psychophysiology.

[76]  A. Grace,et al.  Dopaminergic modulation of limbic and cortical drive of nucleus accumbens in goal-directed behavior , 2005, Nature Neuroscience.

[77]  Clay B. Holroyd,et al.  A mechanism for error detection in speeded response time tasks. , 2005, Journal of experimental psychology. General.

[78]  Michael J. Frank,et al.  Error-Related Negativity Predicts Reinforcement Learning and Conflict Biases , 2005, Neuron.

[79]  W. T. Maddox,et al.  Cortical and subcortical brain regions involved in rule-based category learning , 2005, Neuroreport.

[80]  K. Doya,et al.  Representation of Action-Specific Reward Values in the Striatum , 2005, Science.

[81]  Xiao-bin Wang,et al.  Differential modulation of anterior cingulate cortical activity by afferents from ventral tegmental area and mediodorsal thalamus , 2005, The European journal of neuroscience.

[82]  Michael X. Cohen,et al.  Individual differences in extraversion and dopamine genetics predict neural reward responses. , 2005, Brain research. Cognitive brain research.

[83]  M. Kringelbach The human orbitofrontal cortex: linking reward to hedonic experience , 2005, Nature Reviews Neuroscience.

[84]  Michael J. Frank,et al.  Dynamic Dopamine Modulation in the Basal Ganglia: A Neurocomputational Account of Cognitive Deficits in Medicated and Nonmedicated Parkinsonism , 2005, Journal of Cognitive Neuroscience.

[85]  Paul Glimcher,et al.  Physiological utility theory and the neuroeconomics of choice , 2005, Games Econ. Behav..

[86]  P. Glimcher,et al.  Midbrain Dopamine Neurons Encode a Quantitative Reward Prediction Error Signal , 2005, Neuron.

[87]  S. Inati,et al.  An fMRI study of reward-related probability learning , 2005, NeuroImage.

[88]  A. Rodríguez-Fornells,et al.  Brain potentials related to self-generated and external information used for performance monitoring , 2005, Clinical Neurophysiology.

[89]  C. Ranganath,et al.  Behavioral and neural predictors of upcoming decisions , 2005, Cognitive, affective & behavioral neuroscience.

[90]  Jonathan D. Cohen,et al.  Adaptive gain and the role of the locus coeruleus–norepinephrine system in optimal performance , 2005, The Journal of comparative neurology.

[91]  M. Kawato,et al.  Different neural correlates of reward expectation and reward expectation error in the putamen and caudate nucleus during stimulus-action-reward association learning. , 2006, Journal of neurophysiology.

[92]  R. O’Reilly Biologically Based Computational Models of High-Level Cognition , 2006, Science.

[93]  Rudolf N. Cardinal,et al.  Neural systems implicated in delayed and probabilistic reinforcement , 2006, Neural Networks.

[94]  K. Berridge The debate over dopamine’s role in reward: the case for incentive salience , 2007, Psychopharmacology.

[95]  Philip Holmes,et al.  Rapid decision threshold modulation by reward rate in a neural network , 2006, Neural Networks.

[96]  P. Redgrave,et al.  The short-latency dopamine signal: a role in discovering novel actions? , 2006, Nature Reviews Neuroscience.

[97]  P. Holland,et al.  Role of Substantia Nigra–Amygdala Connections in Surprise-Induced Enhancement of Attention , 2006, The Journal of Neuroscience.

[98]  E. Barratt,et al.  Reduced punishment sensitivity in neural systems of behavior monitoring in impulsive individuals , 2006, Neuroscience Letters.

[99]  P. Dayan,et al.  Cortical substrates for exploratory decisions in humans , 2006, Nature.

[100]  R. Poldrack,et al.  Ventral–striatal/nucleus–accumbens sensitivity to prediction errors during classification learning , 2006, Human brain mapping.

[101]  Onur Güntürkün,et al.  The neuroscience of impulsive and self-controlled decisions. , 2006, International journal of psychophysiology : official journal of the International Organization of Psychophysiology.

[102]  Jochen Ditterich,et al.  Stochastic models of decisions about motion direction: Behavior and physiology , 2006, Neural Networks.

[103]  S. Floresco,et al.  Mesocortical dopamine modulation of executive functions: beyond working memory , 2006, Psychopharmacology.

[104]  J. O'Doherty,et al.  The Role of the Ventromedial Prefrontal Cortex in Abstract State-Based Inference during Decision Making in Humans , 2006, The Journal of Neuroscience.

[105]  Michael J. Frank,et al.  Hold your horses: A dynamic computational role for the subthalamic nucleus in decision making , 2006, Neural Networks.

[106]  Jonathan D. Cohen,et al.  The physics of optimal decision making: a formal analysis of models of performance in two-alternative forced-choice tasks. , 2006, Psychological review.

[107]  J. Gläscher,et al.  Dissociable Systems for Gain- and Loss-Related Value Predictions and Errors of Prediction in the Human Brain , 2006, The Journal of Neuroscience.

[108]  M. Frank,et al.  Anatomy of a decision: striato-orbitofrontal interactions in reinforcement learning, decision making, and reversal. , 2006, Psychological review.

[109]  W. Schultz Behavioral theories and the neurophysiology of reward. , 2006, Annual review of psychology.

[110]  Henrik Walter,et al.  Prediction error as a linear function of reward probability is coded in human nucleus accumbens , 2006, NeuroImage.

[111]  P. Dayan,et al.  Tonic dopamine: opportunity costs and the control of response vigor , 2007, Psychopharmacology.

[112]  K. Doya,et al.  The computational neurobiology of learning and reward , 2006, Current Opinion in Neurobiology.

[113]  J. O'Doherty,et al.  Model‐Based fMRI and Its Application to Reward Learning and Decision Making , 2007, Annals of the New York Academy of Sciences.

[114]  Johannes Hewig,et al.  Decision-making in Blackjack: an electrophysiological analysis. , 2007, Cerebral cortex.

[115]  P. Glimcher,et al.  Statistics of midbrain dopamine neuron spike trains in the awake primate. , 2007, Journal of neurophysiology.

[116]  Michael X. Cohen,et al.  Behavioral / Systems / Cognitive Reinforcement Learning Signals Predict Future Decisions , 2007 .

[117]  Michael J. Frank,et al.  Genetic triple dissociation reveals multiple roles for dopamine in reinforcement learning , 2007, Proceedings of the National Academy of Sciences.

[118]  Angela J. Yu,et al.  Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration , 2007, Philosophical Transactions of the Royal Society B: Biological Sciences.

[119]  Wei-Xing Shi,et al.  Functional Coupling between the Prefrontal Cortex and Dopamine Neurons in the Ventral Tegmental Area , 2007, The Journal of Neuroscience.

[120]  B. Wetering,et al.  Error-processing deficits in patients with cocaine dependence , 2007, Biological Psychology.

[121]  Samuel M. McClure,et al.  Short-term memory traces for action bias in human reinforcement learning , 2007, Brain Research.

[122]  Joshua W. Brown,et al.  Risk prediction and aversion by anterior cingulate cortex , 2007, Cognitive, affective & behavioral neuroscience.

[123]  Timothy E. J. Behrens,et al.  Functional organization of the medial frontal cortex , 2007, Current Opinion in Neurobiology.

[124]  Timothy E. J. Behrens,et al.  Learning the value of information in an uncertain world , 2007, Nature Neuroscience.

[125]  Michael X. Cohen,et al.  Reward expectation modulates feedback-related negativity and EEG spectra , 2007, NeuroImage.

[126]  Brian Knutson,et al.  Splitting the Difference , 2007, Annals of the New York Academy of Sciences.

[127]  Clay B. Holroyd,et al.  It's worse than you thought: the feedback negativity and violations of reward prediction in gambling tasks. , 2007, Psychophysiology.

[128]  Michael X. Cohen,et al.  Different neural systems adjust motor behavior in response to reward and punishment , 2007, NeuroImage.

[129]  T. Robbins,et al.  L-DOPA Disrupts Activity in the Nucleus Accumbens during Reversal Learning in Parkinson's Disease , 2007, Neuropsychopharmacology.

[130]  P. Dayan,et al.  Differential Encoding of Losses and Gains in the Human Striatum , 2007, The Journal of Neuroscience.

[131]  B. Balleine Reward and decision making in corticobasal ganglia networks , 2007 .

[132]  J. Gläscher,et al.  Gene–gene interaction associated with neural reward sensitivity , 2007, Proceedings of the National Academy of Sciences.

[133]  Michael X. Cohen,et al.  Individual Differences and the Neural Representations of Reward Expectation and Reward Prediction Error , 2022 .

[134]  Joshua W. Brown,et al.  A computational model of risk, conflict, and individual difference effects in the anterior cingulate cortex , 2008, Brain Research.

[135]  E T Bullmore,et al.  Substantia nigra/ventral tegmental reward prediction error disruption in psychosis , 2008, Molecular Psychiatry.

[136]  Clay B. Holroyd,et al.  Dorsal anterior cingulate cortex integrates reinforcement history to guide voluntary behavior , 2008, Cortex.

[137]  E. Rolls,et al.  Cerebral Cortex Advance Access published June 22, 2007 Expected Value, Reward Outcome, and Temporal Difference Error Representations in a Probabilistic Decision Task , 2022 .