Neural basis of reinforcement learning and decision making.

Reinforcement learning is an adaptive process in which an animal utilizes its previous experience to improve the outcomes of future choices. Computational theories of reinforcement learning play a central role in the newly emerging areas of neuroeconomics and decision neuroscience. In this framework, actions are chosen according to their value functions, which describe how much future reward is expected from each action. Value functions can be adjusted not only through reward and penalty, but also by the animal's knowledge of its current environment. Studies have revealed that a large proportion of the brain is involved in representing and updating value functions and using them to choose an action. However, how the nature of a behavioral task affects the neural mechanisms of reinforcement learning remains incompletely understood. Future studies should uncover the principles by which different computational elements of reinforcement learning are dynamically coordinated across the entire brain.

[1]  E. Rowland Theory of Games and Economic Behavior , 1946, Nature.

[2]  E. Tolman Cognitive maps in rats and men. , 1948, Psychological review.

[3]  D. Ellsberg Decision, probability, and utility: Risk, ambiguity, and the Savage axioms , 1961 .

[4]  Donald Laming,et al.  Information theory of choice-reaction times , 1968 .

[5]  R. Wurtz,et al.  Activity of superior colliculus in behaving monkey. 3. Cells discharging before eye movements. , 1972, Journal of neurophysiology.

[6]  P. Schiller,et al.  Single-unit recording and stimulation in superior colliculus of the alert rhesus monkey. , 1972, Journal of neurophysiology.

[7]  D. Premack,et al.  Does the chimpanzee have a theory of mind? , 1978, Behavioral and Brain Sciences.

[8]  A. Tversky,et al.  Prospect theory: analysis of decision under risk , 1979 .

[9]  S. Wise,et al.  The premotor cortex of the monkey , 1982, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[10]  G. Paxinos,et al.  The Rat Brain in Stereotaxic Coordinates , 1983 .

[11]  J. Tanji,et al.  Contrasting neuronal activity in supplementary and precentral motor cortex of monkeys. I. Responses to instructions determining motor responses to forthcoming signals of different modalities. , 1985, Journal of neurophysiology.

[12]  C. Bruce,et al.  Primate frontal eye fields. I. Single neurons discharging before saccades. , 1985, Journal of neurophysiology.

[13]  A. P. Georgopoulos,et al.  Neuronal population coding of movement direction. , 1986, Science.

[14]  M. Schlag-Rey,et al.  Evidence for a supplementary eye field. , 1987, Journal of neurophysiology.

[15]  P. Goldman-Rakic,et al.  Mnemonic coding of visual space in the monkey's dorsolateral prefrontal cortex. , 1989, Journal of neurophysiology.

[16]  J. Schall,et al.  Neuronal activity related to visually guided saccadic eye movements in the supplementary motor area of rhesus monkeys. , 1991, Journal of neurophysiology.

[17]  David L. Sparks,et al.  Movement selection in advance of action in the superior colliculus , 1992, Nature.

[18]  D L Price,et al.  Localization of D1 and D2 dopamine receptors in brain with subtype-specific antibodies. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[19]  William R. Softky,et al.  The highly irregular firing of cortical cells is inconsistent with temporal integration of random EPSPs , 1993, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[20]  Dilip Mookherjee,et al.  Learning behavior in an experimental matching pennies game , 1994 .

[21]  N. Roese,et al.  What Might Have Been: The Social Psychology of Counterfactual Thinking , 1995 .

[22]  J. Schall,et al.  Neural Control of Voluntary Movement Initiation , 1996, Science.

[23]  P. Dayan,et al.  A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[24]  Jennifer A. Mangels,et al.  A Neostriatal Habit Learning System in Humans , 1996, Science.

[25]  Dilip Mookherjee,et al.  Learning and Decision Costs in Experimental Constant Sum Games , 1997 .

[26]  A. Roth,et al.  Predicting How People Play Games: Reinforcement Learning in Experimental Games with Unique, Mixed Strategy Equilibria , 1998 .

[27]  B. Balleine,et al.  Goal-directed instrumental action: contingency and incentive learning and their cortical substrates , 1998, Neuropharmacology.

[28]  Michael L. Platt,et al.  Neural correlates of decision variables in parietal cortex , 1999, Nature.

[29]  Colin Camerer,et al.  Experience‐weighted Attraction Learning in Normal Form Games , 1999 .

[30]  W. Schultz,et al.  Relative reward preference in primate orbitofrontal cortex , 1999, Nature.

[31]  K. Doya,et al.  Parallel neural networks for learning sequential procedures , 1999, Trends in Neurosciences.

[32]  Nick Feltovich,et al.  Reinforcement-based vs. Belief-based Learning Models in Experimental Asymmetric-information Games , 2000 .

[33]  J. Hollerman,et al.  Reward processing in primate orbitofrontal cortex and basal ganglia. , 2000, Cerebral cortex.

[34]  Nikolaus R. McFarland,et al.  Striatonigrostriatal Pathways in Primates Form an Ascending Spiral from the Shell to the Dorsolateral Striatum , 2000, The Journal of Neuroscience.

[35]  Xiao-Jing Wang Synaptic reverberation underlying mnemonic persistent activity , 2001, Trends in Neurosciences.

[36]  James L. McClelland,et al.  The time course of perceptual choice: the leaky, competing accumulator model. , 2001, Psychological review.

[37]  J. Wickens,et al.  A cellular mechanism of reward-related learning , 2001, Nature.

[38]  W T Newsome,et al.  Target selection for saccadic eye movements: prelude activity in the superior colliculus during a direction-discrimination task. , 2001, Journal of neurophysiology.

[39]  W. Newsome,et al.  Neural basis of a perceptual decision in the parietal cortex (area LIP) of the rhesus monkey. , 2001, Journal of neurophysiology.

[40]  A. Sampson,et al.  Dopamine transporter immunoreactivity in monkey cerebral cortex: Regional, laminar, and ultrastructural localization , 2001, The Journal of comparative neurology.

[41]  O. Hikosaka,et al.  Visual and Anticipatory Bias in Three Cortical Eye Fields of the Monkey during an Adaptive Decision-Making Task , 2002, The Journal of Neuroscience.

[42]  Geoffrey Schoenbaum,et al.  Orbitofrontal lesions in rats impair reversal but not acquisition of go, no-go odor discriminations , 2002, Neuroreport.

[43]  Xiao-Jing Wang,et al.  Probabilistic Decision Making by Slow Reverberation in Cortical Circuits , 2002, Neuron.

[44]  M. Shadlen,et al.  Response of Neurons in the Lateral Intraparietal Area during a Combined Visual Discrimination Reaction Time Task , 2002, The Journal of Neuroscience.

[45]  Colin Camerer Behavioral Game Theory: Experiments in Strategic Interaction , 2003 .

[46]  Jillian H. Fecteau,et al.  Exploring the consequences of the previous trial , 2003, Nature Reviews Neuroscience.

[47]  M. Farah,et al.  Ventromedial frontal cortex mediates affective shifting in humans: evidence from a reversal learning paradigm. , 2003, Brain : a journal of neurology.

[48]  E. Miller,et al.  Neuronal activity in primate dorsolateral and orbital prefrontal cortex during performance of a reward preference task , 2003, The European journal of neuroscience.

[49]  P. Glimcher,et al.  Activity in Posterior Parietal Cortex Is Correlated with the Relative Subjective Desirability of Action , 2004, Neuron.

[50]  W. Newsome,et al.  Matching Behavior and the Representation of Value in the Parietal Cortex , 2004, Science.

[51]  Philip L. Smith,et al.  Psychology and neurobiology of simple decisions , 2004, Trends in Neurosciences.

[52]  R. Andersen,et al.  Memory related motor planning activity in posterior parietal cortex of macaque , 1988, Experimental Brain Research.

[53]  Masato Taira,et al.  Motor cortical activity in a memorized delay task , 1992, Experimental Brain Research.

[54]  D. Barraclough,et al.  Prefrontal cortex and decision making in a mixed-strategy game , 2004, Nature Neuroscience.

[55]  R. M. Siegel,et al.  Neurons of area 7 activated by both visual stimuli and oculomotor behavior , 2004, Experimental Brain Research.

[56]  J. Tanji,et al.  Neuronal activities in the primate motor fields of the agranular frontal cortex preceding visually triggered and self-paced movement , 2004, Experimental Brain Research.

[57]  A. Sirigu,et al.  The Involvement of the Orbitofrontal Cortex in the Experience of Regret , 2004, Science.

[58]  J. Maunsell Neuronal representations of cognitive state: reward or attention? , 2004, Trends in Cognitive Sciences.

[59]  M. Mishkin,et al.  Perseverative interference in monkeys following selective lesions of the inferior prefrontal convexity , 1970, Experimental Brain Research.

[60]  D. Barraclough,et al.  Reinforcement learning and decision making in monkeys during a competitive game. , 2004, Brain research. Cognitive brain research.

[61]  W. Pan,et al.  Dopamine Cells Respond to Predicted Events during Classical Conditioning: Evidence for Eligibility Traces in the Reward-Learning Network , 2005, The Journal of Neuroscience.

[62]  K. Doya,et al.  Representation of Action-Specific Reward Values in the Striatum , 2005, Science.

[63]  Colin Camerer,et al.  Neural Systems Responding to Degrees of Uncertainty in Human Decision-Making , 2005, Science.

[64]  Angela J. Yu,et al.  Uncertainty, Neuromodulation, and Attention , 2005, Neuron.

[65]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[66]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[67]  J. O'Doherty,et al.  Regret and its avoidance: a neuroimaging study of choice behavior , 2005, Nature Neuroscience.

[68]  D. Barraclough,et al.  Learning and decision making in monkeys during a rock-paper-scissors game. , 2005, Brain research. Cognitive brain research.

[69]  C. Padoa-Schioppa,et al.  Neurons in the orbitofrontal cortex encode economic value , 2006, Nature.

[70]  Xiao-Jing Wang,et al.  A Biophysically Based Neural Model of Matching Law Behavior: Melioration by Stochastic Synapses , 2006, The Journal of Neuroscience.

[71]  Aldo Genovesio,et al.  Representation of Future and Previous Spatial Goals by Separate Neural Populations in Prefrontal Cortex , 2006, The Journal of Neuroscience.

[72]  Kae Nakamura,et al.  Basal ganglia orient eyes to reward. , 2006, Journal of neurophysiology.

[73]  Byron M. Yu,et al.  Neural Variability in Premotor Cortex Provides a Signature of Motor Preparation , 2006, The Journal of Neuroscience.

[74]  H. Yin,et al.  The role of the basal ganglia in habit formation , 2006, Nature Reviews Neuroscience.

[75]  Xiao-Jing Wang,et al.  Neural mechanism for stochastic behaviour during a competitive game , 2006, Neural Networks.

[76]  Evan M. Gordon,et al.  Neural Signatures of Economic Preferences for Risk and Ambiguity , 2006, Neuron.

[77]  Mark F Bear,et al.  Reward timing in the primary visual cortex. , 2006, Science.

[78]  W. Schultz Behavioral theories and the neurophysiology of reward. , 2006, Annual review of psychology.

[79]  Xiao-Jing Wang,et al.  Cortico–basal ganglia circuit mechanism for a decision threshold in reaction time tasks , 2006, Nature Neuroscience.

[80]  A. Tversky,et al.  Prospect theory: an analysis of decision under risk — Source link , 2007 .

[81]  Daeyeol Lee,et al.  Order-Dependent Modulation of Directional Signals in the Supplementary and Presupplementary Motor Areas , 2007, The Journal of Neuroscience.

[82]  O. Hikosaka,et al.  Lateral habenula as a source of negative reward signals in dopamine neurons , 2007, Nature.

[83]  Keiji Tanaka,et al.  Medial prefrontal cell activity signaling prediction errors of action values , 2007, Nature Neuroscience.

[84]  Timothy E. J. Behrens,et al.  Learning the value of information in an uncertain world , 2007, Nature Neuroscience.

[85]  H. Seo,et al.  Temporal Filtering of Reward Signals in the Dorsal Anterior Cingulate Cortex during a Mixed-Strategy Game , 2007, The Journal of Neuroscience.

[86]  Adam Johnson,et al.  Neural Ensembles in CA3 Transiently Encode Paths Forward of the Animal at a Decision Point , 2007, The Journal of Neuroscience.

[87]  D. Schacter,et al.  Remembering the past to imagine the future: the prospective brain , 2007, Nature Reviews Neuroscience.

[88]  D. Hassabis,et al.  Deconstructing episodic memory with construction , 2007, Trends in Cognitive Sciences.

[89]  J. O'Doherty,et al.  What We Know and Do Not Know about the Functions of the Orbitofrontal Cortex after 20 Years of Cross-Species Studies , 2007, The Journal of Neuroscience.

[90]  Kevin McCabe,et al.  Neural signature of fictive learning signals in a sequential investment task , 2007, Proceedings of the National Academy of Sciences.

[91]  Daeyeol Lee,et al.  Encoding of action history in the rat ventral striatum. , 2007, Journal of neurophysiology.

[92]  Daeyeol Lee Game theory and neural basis of social decision making , 2008, Nature Neuroscience.

[93]  Mark W Woolrich,et al.  Associative learning of social value , 2008, Nature.

[94]  Daeyeol Lee,et al.  Prefrontal Coding of Temporally Discounted Values during Intertemporal Choice , 2008, Neuron.

[95]  Hatim A. Zariwala,et al.  Neural correlates, computation and behavioural impact of decision confidence , 2008, Nature.

[96]  M. Brass,et al.  Unconscious determinants of free decisions in the human brain , 2008, Nature Neuroscience.

[97]  John T Serences,et al.  Value-Based Modulations in Human Visual Cortex , 2008, Neuron.

[98]  Simon Hong,et al.  The Globus Pallidus Sends Reward-Related Signals to the Lateral Habenula , 2008, Neuron.

[99]  Daeyeol Lee,et al.  Neural Dissociation of Delay and Uncertainty in Intertemporal Choice , 2008, The Journal of Neuroscience.

[100]  Sean M Montgomery,et al.  Entrainment of Neocortical Neurons and Gamma Oscillations by the Hippocampal Theta Rhythm , 2008, Neuron.

[101]  Xiao-Jing Wang,et al.  Similarity Effect and Optimal Control of Multiple-Choice Decision Making , 2008, Neuron.

[102]  I. Tsuda,et al.  Reward prediction based on stimulus categorization in primate lateral prefrontal cortex , 2008, Nature Neuroscience.

[103]  P. Dayan,et al.  Reinforcement learning: The Good, The Bad and The Ugly , 2008, Current Opinion in Neurobiology.

[104]  Xiao-Jing Wang Decision Making in Recurrent Neuronal Circuits , 2008, Neuron.

[105]  P. Glimcher,et al.  Value Representations in the Primate Striatum during Matching Behavior , 2008, Neuron.

[106]  C. Kennard,et al.  Functional role of the supplementary and pre-supplementary motor areas , 2008, Nature Reviews Neuroscience.

[107]  Timothy D. Hanks,et al.  Probabilistic Population Codes for Bayesian Decision Making , 2008, Neuron.

[108]  P. Greengard,et al.  Dichotomous Dopaminergic Control of Striatal Synaptic Plasticity , 2008, Science.

[109]  H. Seo,et al.  Cortical mechanisms for reinforcement learning in competitive games , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.

[110]  P. Haggard Human volition: towards a neuroscience of will , 2008, Nature Reviews Neuroscience.

[111]  Joseph J. Paton,et al.  Moment-to-Moment Tracking of State Value in the Amygdala , 2008, The Journal of Neuroscience.

[112]  Daeyeol Lee,et al.  Behavioral and Neural Changes after Gains and Losses of Conditioned Reinforcers , 2009, The Journal of Neuroscience.

[113]  John M. Pearson,et al.  Fictive Reward Signals in the Anterior Cingulate Cortex , 2009, Science.

[114]  Klaus Wunderlich,et al.  Neural computations underlying action-based decision making in the human brain , 2009, Proceedings of the National Academy of Sciences.

[115]  K. Doya,et al.  Validation of Decision-Making Models and Analysis of Decision Variables in the Rat Basal Ganglia , 2009, The Journal of Neuroscience.

[116]  Jung Hoon Sul,et al.  Role of Striatum in Updating Values of Chosen Actions , 2009, The Journal of Neuroscience.

[117]  H. Seo,et al.  Lateral Intraparietal Cortex and Reinforcement Learning during a Mixed-Strategy Game , 2009, The Journal of Neuroscience.

[118]  M. Roesch,et al.  Ventral Striatal Neurons Encode the Value of the Chosen Action in Rats Deciding between Differently Delayed or Sized Rewards , 2009, The Journal of Neuroscience.

[119]  Michael E. Hasselmo,et al.  Working Memory Performance Correlates with Prefrontal-Hippocampal Theta Interactions but not with Prefrontal Neuron Firing Rates , 2009, Front. Integr. Neurosci..

[120]  Ian Krajbich,et al.  Visual fixations and the computation and comparison of value in simple choice , 2010, Nature Neuroscience.

[121]  W. Schultz,et al.  Coding of Reward Risk by Orbitofrontal Neurons Is Mostly Distinct from Coding of Reward Value , 2010, Neuron.

[122]  Mehdi Khamassi,et al.  Coherent Theta Oscillations and Reorganization of Spike Timing in the Hippocampal- Prefrontal Network upon Learning , 2010, Neuron.

[123]  P. Dayan,et al.  States versus Rewards: Dissociable Neural Prediction Error Signals Underlying Model-Based and Model-Free Reinforcement Learning , 2010, Neuron.

[124]  Daeyeol Lee,et al.  Beyond working memory: the role of persistent activity in decision making , 2010, Trends in Cognitive Sciences.

[125]  Ethan S. Bromberg-Martin,et al.  Dopamine in Motivational Control: Rewarding, Aversive, and Alerting , 2010, Neuron.

[126]  James L. McClelland,et al.  Integration of Sensory and Reward Information during Perceptual Decision-Making in Lateral Intraparietal Cortex (LIP) of the Macaque Monkey , 2010, PloS one.

[127]  S. Kennerley,et al.  Heterogeneous reward signals in prefrontal cortex , 2010, Current Opinion in Neurobiology.

[128]  T. Womelsdorf,et al.  Human Neuroscience , 2022 .

[129]  I. Hernádi,et al.  Reward Prediction Error Coding in Dorsal Striatal Neurons , 2010, The Journal of Neuroscience.

[130]  Simon Hong,et al.  A pallidus-habenula-dopamine pathway signals inferred stimulus values. , 2010, Journal of neurophysiology.

[131]  Jung Hoon Sul,et al.  Distinct Roles of Rodent Orbitofrontal and Medial Prefrontal Cortex in Decision Making , 2010, Neuron.

[132]  S. Haber,et al.  The Reward Circuit: Linking Primate Anatomy and Human Imaging , 2010, Neuropsychopharmacology.

[133]  V. Stuphorn,et al.  Supplementary eye field encodes option and action value for saccades with variable reward. , 2010, Journal of neurophysiology.

[134]  Timothy Edward John Behrens,et al.  Separable Learning Systems in the Macaque Brain and the Role of Orbitofrontal Cortex in Contingent Learning , 2010, Neuron.

[135]  Kenway Louie,et al.  Separating Value from Choice: Delay Discounting Activity in the Lateral Intraparietal Area , 2010, The Journal of Neuroscience.

[136]  E. Eskandar,et al.  Encoding of Both Positive and Negative Reward Prediction Errors by Neurons of the Primate Lateral Prefrontal Cortex and Caudate Nucleus , 2011, The Journal of Neuroscience.

[137]  Daeyeol Lee,et al.  Role of rodent secondary motor cortex in value-based action selection , 2011, Nature Neuroscience.

[138]  C. Padoa-Schioppa Neurobiology of economic choice: a good-based model. , 2011, Annual review of neuroscience.

[139]  Paul Cisek,et al.  Neural Correlates of Biased Competition in Premotor Cortex , 2011, The Journal of Neuroscience.

[140]  Daeyeol Lee,et al.  Ubiquity and Specificity of Reinforcement Signals throughout the Human Brain , 2011, Neuron.

[141]  Daeyeol Lee,et al.  Heterogeneous Coding of Temporally Discounted Values in the Dorsal and Ventral Striatum during Intertemporal Choice , 2011, Neuron.

[142]  N. Daw,et al.  Multiplicity of control in the basal ganglia: computational roles of striatal subregions , 2011, Current Opinion in Neurobiology.

[143]  Timothy E. J. Behrens,et al.  Counterfactual Choice and Learning in a Neural Network Centered on Human Lateral Frontopolar Cortex , 2011, PLoS biology.

[144]  A. Rangel,et al.  Dissociating valuation and saliency signals during decision-making. , 2011, Cerebral cortex.

[145]  Simon Hong,et al.  Dopamine-Mediated Learning and Switching in Cortico-Striatal Circuit Explain Behavioral Changes in Reinforcement Learning , 2011, Front. Behav. Neurosci..

[146]  Daeyeol Lee,et al.  Prefrontal Cortex and Impulsive Decision Making , 2011, Biological Psychiatry.

[147]  Matthijs A. A. van der Meer,et al.  Ventral striatum: a critical look at models of learning and evaluation , 2011, Current Opinion in Neurobiology.

[148]  Dylan A. Simon,et al.  Neural Correlates of Forward Planning in a Spatial Decision Task in Humans , 2011, The Journal of Neuroscience.

[149]  P. Dayan,et al.  Model-based influences on humans’ choices and striatal prediction errors , 2011, Neuron.

[150]  Daeyeol Lee,et al.  Distributed Coding of Actual and Hypothetical Outcomes in the Orbital and Dorsolateral Prefrontal Cortex , 2011, Neuron.

[151]  K. Doya,et al.  Multiple representations and algorithms for reinforcement learning in the cortico-basal ganglia circuit , 2011, Current Opinion in Neurobiology.

[152]  C. Gerfen,et al.  Modulation of striatal projection systems by dopamine. , 2011, Annual review of neuroscience.

[153]  H. Sompolinsky,et al.  Compressed sensing, sparsity, and dimensionality in neuronal information processing and data analysis. , 2012, Annual review of neuroscience.

[154]  Bruce G Cumming,et al.  Decision-related activity in sensory neurons: correlations among neurons and with behavior. , 2012, Annual review of neuroscience.

[155]  Mark F Bear,et al.  The pathophysiology of fragile X (and what it teaches us about synapses). , 2012, Annual review of neuroscience.

[156]  B. Barres,et al.  The complement system: an unexpected role in synaptic pruning during development and disease. , 2012, Annual review of neuroscience.

[157]  C. S. Green,et al.  Brain plasticity through the life span: learning to learn and action video games. , 2012, Annual review of neuroscience.

[158]  Kyle E. Mathewson,et al.  Dissociable neural representations of reinforcement and belief prediction errors underlie strategic learning , 2012, Proceedings of the National Academy of Sciences.

[159]  J. Gold,et al.  Neural correlates of perceptual decision making before, during, and after decision commitment in monkey frontal eye field. , 2012, Cerebral cortex.

[160]  ปิยดา สมบัติวัฒนา Behavioral Game Theory: Experiments in Strategic Interaction , 2013 .