Reinforcement learning in the brain

[1]  W. Brown Animal Intelligence: Experimental Studies , 1912, Nature.

[2]  B. Skinner Two Types of Conditioned Reflex and a Pseudo Type , 1935 .

[3]  J. Konorski Conditioned reflexes and neuron organization. , 1948 .

[4]  R. R. Bush,et al.  A mathematical model for simple learning. , 1951, Psychological review.

[5]  Arthur L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[6]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[7]  G S REYNOLDS,et al.  Attention in the pigeon. , 1961, Journal of the experimental analysis of behavior.

[8]  R. Rescorla,et al.  INHIBITION OF AVOIDANCE BEHAVIOR. , 1965, Journal of comparative and physiological psychology.

[9]  L. Kamin Predictability, surprise, attention, and conditioning , 1967 .

[10]  B. Campbell,et al.  Punishment and aversive behavior , 1969 .

[11]  R. Rescorla Reduction in the effectiveness of reinforcement after prior excitatory conditioning , 1970 .

[12]  R. Rescorla,et al.  A theory of Pavlovian conditioning : Variations in the effectiveness of reinforcement and nonreinforcement , 1972 .

[13]  W. F. Prokasy,et al.  Classical conditioning II: Current research and theory. , 1972 .

[14]  J. Gibbon Scalar expectancy theory and Weber's law in animal timing. , 1977 .

[15]  R. Wise,et al.  Neuroleptic-induced "anhedonia" in rats: pimozide blocks reward quality of food. , 1978, Science.

[16]  E. Kremer The Rescorla-Wagner model: losses in associative strength in compound conditioned stimuli. , 1978, Journal of experimental psychology. Animal behavior processes.

[17]  R. Wise,et al.  Major attenuation of food reward with performance-sparing doses of pimozide in the rat. , 1978, Canadian journal of psychology.

[18]  Christopher D. Adams,et al.  Instrumental Responding following Reinforcer Devaluation , 1981 .

[19]  Christopher D. Adams Variations in the Sensitivity of Instrumental Responding to Reinforcer Devaluation , 1982 .

[20]  Christopher D. Adams,et al.  The Effect of the Instrumental Training Contingency on Susceptibility to Reinforcer Devaluation , 1983 .

[21]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[22]  A. Dickinson Actions and habits: the development of behavioural autonomy , 1985 .

[23]  K. Wilcox,et al.  Stimulation of the lateral habenula inhibits dopamine-containing neurons in the substantia nigra and ventral tegmental area of the rat , 1986, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[24]  Richard S. Sutton,et al.  Sequential Decision Problems and Neural Networks , 1989, NIPS 1989.

[25]  A. Barto,et al.  Learning and Sequential Decision Making , 1989 .

[26]  W. Schultz,et al.  Dopamine neurons of the monkey midbrain: contingencies of responses to active touch during self-initiated arm movements. , 1990, Journal of neurophysiology.

[27]  Richard S. Sutton,et al.  Time-Derivative Models of Pavlovian Reinforcement , 1990 .

[28]  M. Gabriel,et al.  Learning and Computational Neuroscience: Foundations of Adaptive Networks , 1990 .

[29]  A. Grace Phasic versus tonic dopamine release and the modulation of dopamine system responsivity: A hypothesis for the etiology of schizophrenia , 1991, Neuroscience.

[30]  Terrence J. Sejnowski,et al.  Using Aperiodic Reinforcement for Directed Self-Organization During Development , 1992, NIPS.

[31]  W. Schultz,et al.  Responses of monkey dopamine neurons during learning of behavioral reactions. , 1992, Journal of neurophysiology.

[32]  W. Schultz,et al.  Neuronal activity in monkey ventral striatum related to the expectation of reward , 1992, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[33]  Terrence J. Sejnowski,et al.  Foraging in an Uncertain Environment Using Predictive Hebbian Learning , 1993, NIPS.

[34]  A. Parent,et al.  Anatomical aspects of information processing in primate basal ganglia , 1993, Trends in Neurosciences.

[35]  W. Schultz,et al.  Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task , 1993, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[36]  Joel L. Davis,et al.  A Model of How the Basal Ganglia Generate and Use Neural Signals That Predict Reinforcement , 1994 .

[37]  Andrew G. Barto,et al.  Reinforcement learning control , 1994, Current Opinion in Neurobiology.

[38]  D. Joel,et al.  The organization of the basal ganglia-thalamocortical circuits: Open interconnected rather than closed segregated , 1994, Neuroscience.

[39]  O. Hikosaka Models of information processing in the basal Ganglia edited by James C. Houk, Joel L. Davis and David G. Beiser, The MIT Press, 1995. $60.00 (400 pp) ISBN 0 262 08234 9 , 1995, Trends in Neurosciences.

[40]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[41]  Peter Dayan,et al.  Bee foraging in uncertain environments using predictive hebbian learning , 1995, Nature.

[42]  B. Balleine,et al.  Motivational control of heterogeneous instrumental chains. , 1995 .

[43]  Leemon C. Baird,et al.  Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[44]  J. Wickens,et al.  Cellular models of reinforcement. , 1995 .

[45]  W. Schultz,et al.  Preferential activation of midbrain dopamine neurons by appetitive rather than aversive stimuli , 1996, Nature.

[46]  J. Wickens,et al.  Dopamine reverses the depression of rat corticostriatal synapses which normally follows high-frequency stimulation of cortex In vitro , 1996, Neuroscience.

[47]  P. Dayan,et al.  A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[48]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[49]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[50]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[51]  A. Kacelnik Normative and descriptive models of decision making: time discounting and risk sensitivity. , 2007, Ciba Foundation symposium.

[52]  E. Bullmore,et al.  Society for Neuroscience Abstracts , 1997 .

[53]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[54]  J. Hollerman,et al.  Dopamine neurons report an error in the temporal prediction of reward during learning , 1998, Nature Neuroscience.

[55]  P. Goldman-Rakic,et al.  Dopaminergic regulation of cerebral cortical microcirculation , 1998, Nature Neuroscience.

[56]  B. Balleine,et al.  Goal-directed instrumental action: contingency and incentive learning and their cortical substrates , 1998, Neuropharmacology.

[57]  K. Berridge,et al.  What is the role of dopamine in reward: hedonic impact, reward learning, or incentive salience? , 1998, Brain Research Reviews.

[58]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[59]  P. Redgrave,et al.  Is the short-latency dopamine response too short to signal reward error? , 1999, Trends in Neurosciences.

[60]  P. Holland,et al.  Amygdala circuitry in attentional and representational processes , 1999, Trends in Cognitive Sciences.

[61]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[62]  Vijay R. Konda,et al.  Actor-Critic Algorithms , 1999, NIPS.

[63]  S. Ikemoto,et al.  The role of nucleus accumbens dopamine in motivated behavior: a unifying interpretation with special reference to reward-seeking , 1999, Brain Research Reviews.

[64]  Kenji Doya,et al.  Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.

[65]  J. Wickens,et al.  Dopamine and synaptic plasticity in the neostriatum , 2000, Journal of anatomy.

[66]  S. Kakade,et al.  Learning and selective attention , 2000, Nature Neuroscience.

[67]  R. Malenka,et al.  Dopaminergic modulation of neuronal excitability in the striatum and nucleus accumbens. , 2000, Annual review of neuroscience.

[68]  J. Horvitz Mesolimbocortical and nigrostriatal dopamine responses to salient non-reward events , 2000, Neuroscience.

[69]  C. Gallistel,et al.  Time, rate, and conditioning. , 2000, Psychological review.

[70]  Samuel M. McClure,et al.  Predictability Modulates Human Brain Response to Reward , 2001, The Journal of Neuroscience.

[71]  J. Wickens,et al.  A cellular mechanism of reward-related learning , 2001, Nature.

[72]  Peter Dayan,et al.  Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems , 2001 .

[73]  W. Schultz,et al.  Dopamine responses comply with basic assumptions of formal learning theory , 2001, Nature.

[74]  Brian Knutson,et al.  Dissociation of reward anticipation and outcome with event-related fMRI , 2001, Neuroreport.

[75]  Brian Knutson,et al.  Anticipation of Increasing Monetary Reward Selectively Recruits Nucleus Accumbens , 2001, The Journal of Neuroscience.

[76]  Sham M. Kakade,et al.  Opponent interactions between serotonin and dopamine , 2002, Neural Networks.

[77]  J. O'Doherty,et al.  Neural Responses during Anticipation of a Primary Taste Reward , 2002, Neuron.

[78]  Doina Precup,et al.  Learning Options in Reinforcement Learning , 2002, SARA.

[79]  S. Killcross,et al.  3. Associative representations of emotionally significant outcomes , 2002 .

[80]  W. Schultz Getting Formal with Dopamine and Reward , 2002, Neuron.

[81]  S. S. Stevens,et al.  Learning, motivation, and emotion , 2002 .

[82]  Eytan Ruppin,et al.  Actor-critic models of the basal ganglia: new anatomical and computational perspectives , 2002, Neural Networks.

[83]  David S. Touretzky,et al.  Timing and Partial Observability in the Dopamine System , 2002, NIPS.

[84]  D. Joel,et al.  Dopamine in Schizophrenia Dysfunctional Information Processing in Basal Ganglia — Thalamocortical Split Circuits , 2002 .

[85]  B. Everitt,et al.  Emotion and motivation: the role of the amygdala, ventral striatum, and prefrontal cortex , 2002, Neuroscience & Biobehavioral Reviews.

[86]  P. Montague,et al.  Activity in human ventral striatum locked to errors of reward prediction , 2002, Nature Neuroscience.

[87]  J. Salamone,et al.  Motivational views of reinforcement: implications for understanding the behavioral functions of nucleus accumbens dopamine , 2002, Behavioural Brain Research.

[88]  Karl J. Friston,et al.  Temporal difference learning model accounts for responses in human ventral striatum , 2002 .

[89]  David S. Touretzky,et al.  Long-Term Reward Prediction in TD Models of the Dopamine System , 2002, Neural Computation.

[90]  Peter Dayan,et al.  Dopamine: generalization and bonuses , 2002, Neural Networks.

[91]  D. Attwell,et al.  The neural basis of functional brain imaging signals , 2002, Trends in Neurosciences.

[92]  M. Oaksford,et al.  Emotional cognition: from brain to behaviour , 2002 .

[93]  B. Balleine,et al.  The Role of Learning in the Operation of Motivational Systems , 2002 .

[94]  Brian Knutson,et al.  A region of mesial prefrontal cortex tracks monetarily rewarding outcomes: characterization with rapid event-related fMRI , 2003, NeuroImage.

[95]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[96]  W. Schultz,et al.  Coding of Predicted Reward Omission by Dopamine Neurons in a Conditioned Inhibition Paradigm , 2003, The Journal of Neuroscience.

[97]  Samuel M. McClure,et al.  A computational substrate for incentive salience , 2003, Trends in Neurosciences.

[98]  Samuel M. McClure,et al.  Temporal Prediction Errors in a Passive Learning Task Activate Human Striatum , 2003, Neuron.

[99]  Karl J. Friston,et al.  Temporal Difference Models and Reward-Related Learning in the Human Brain , 2003, Neuron.

[100]  N. Logothetis The Underpinnings of the BOLD Functional Magnetic Resonance Imaging Signal , 2003, The Journal of Neuroscience.

[101]  W. Schultz,et al.  Discrete Coding of Reward Probability and Uncertainty by Dopamine Neurons , 2003, Science.

[102]  M. Delgado,et al.  Dorsal striatum responses to reward and punishment: Effects of valence and magnitude manipulations , 2003, Cognitive, affective & behavioral neuroscience.

[103]  A. Grace,et al.  Afferent modulation of dopamine neuron firing differentially regulates tonic and phasic dopamine transmission , 2003, Nature Neuroscience.

[104]  S. Killcross,et al.  Coordination of actions and habits in the medial prefrontal cortex of rats. , 2003, Cerebral cortex.

[105]  P. Garris,et al.  ‘Passive stabilization’ of striatal extracellular dopamine across the lesion spectrum encompassing the presymptomatic phase of Parkinson's disease: a voltammetric study in the 6‐OHDA‐lesioned rat , 2003, Journal of neurochemistry.

[106]  P. Montague,et al.  Dynamic Gain Control of Dopamine Delivery in Freely Moving Animals , 2004, The Journal of Neuroscience.

[107]  J. Bolam,et al.  Uniform Inhibition of Dopamine Neurons in the Ventral Tegmental Area by Aversive Stimuli , 2004, Science.

[108]  Samuel M. McClure,et al.  Neural Correlates of Behavioral Preference for Culturally Familiar Drinks , 2004, Neuron.

[109]  Karl J. Friston,et al.  Dissociable Roles of Ventral and Dorsal Striatum in Instrumental Conditioning , 2004, Science.

[110]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[111]  O. Hikosaka,et al.  A possible role of midbrain dopamine neurons in short- and long-term adaptation of saccades to position-reward mapping. , 2004, Journal of neurophysiology.

[112]  Nikos K Logothetis,et al.  Interpreting the BOLD signal. , 2004, Annual review of physiology.

[113]  F. McGlone,et al.  Dopamine Transmission in the Human Striatum during Monetary Reward Tasks , 2004, The Journal of Neuroscience.

[114]  B. Balleine,et al.  Lesions of dorsolateral striatum preserve outcome expectancy but disrupt habit formation in instrumental learning , 2004, The European journal of neuroscience.

[115]  Peter Dayan,et al.  Temporal difference models describe higher-order learning in humans , 2004, Nature.

[116]  Andrew G. Barto,et al.  Using relative novelty to identify useful temporal abstractions in reinforcement learning , 2004, ICML.

[117]  E. Vaadia,et al.  Coincident but Distinct Messages of Midbrain Dopamine and Striatal Tonically Active Neurons , 2004, Neuron.

[118]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[119]  A. Grace,et al.  Dopaminergic modulation of limbic and cortical drive of nucleus accumbens in goal-directed behavior , 2005, Nature Neuroscience.

[120]  K. Doya,et al.  Representation of Action-Specific Reward Values in the Striatum , 2005, Science.

[121]  S. Geisler,et al.  Afferents of the ventral tegmental area in the rat‐anatomical substratum for integrative functions , 2005, The Journal of comparative neurology.

[122]  W. Schultz,et al.  Adaptive Coding of Reward Value by Dopamine Neurons , 2005, Science.

[123]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[124]  K. Berridge Espresso reward learning, hold the dopamine: theoretical comment on Robinson et al. (2005). , 2005, Behavioral neuroscience.

[125]  B. Balleine Neural bases of food-seeking: Affect, arousal and reward in corticostriatolimbic circuits , 2005, Physiology & Behavior.

[126]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[127]  P. Glimcher,et al.  Midbrain Dopamine Neurons Encode a Quantitative Reward Prediction Error Signal , 2005, Neuron.

[128]  B. Balleine,et al.  The role of the dorsomedial striatum in instrumental conditioning , 2005, The European journal of neuroscience.

[129]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[130]  Peter Dayan,et al.  How fast to work: Response vigor, motivation and tonic dopamine , 2005, NIPS.

[131]  P. Dayan,et al.  Dopamine, uncertainty and TD learning , 2005, Behavioral and Brain Functions.

[132]  B. Balleine,et al.  Blockade of NMDA receptors in the dorsomedial striatum prevents action–outcome learning in instrumental conditioning , 2005, The European journal of neuroscience.

[133]  P. Dayan,et al.  Choice values , 2006, Nature Neuroscience.

[134]  Rui M. Costa,et al.  Rapid Alterations in Corticostriatal Ensemble Coordination during Acute Dopamine-Dependent Motor Dysfunction , 2006, Neuron.

[135]  K. Berridge The debate over dopamine’s role in reward: the case for incentive salience , 2007, Psychopharmacology.

[136]  P. Redgrave,et al.  The short-latency dopamine signal: a role in discovering novel actions? , 2006, Nature Reviews Neuroscience.

[137]  Brian Knutson,et al.  Linking nucleus accumbens dopamine and blood oxygenation , 2007, Psychopharmacology.

[138]  P. Dayan,et al.  Cortical substrates for exploratory decisions in humans , 2006, Nature.

[139]  S. Quartz,et al.  Neural Differentiation of Expected Reward and Risk in Human Subcortical Structures , 2006, Neuron.

[140]  R. Dolan,et al.  Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans , 2006, Nature.

[141]  P. Dayan,et al.  Opinion TRENDS in Cognitive Sciences Vol.10 No.8 Full text provided by www.sciencedirect.com A normative perspective on motivation , 2022 .

[142]  J. O'Doherty,et al.  The Role of the Ventromedial Prefrontal Cortex in Abstract State-Based Inference during Decision Making in Humans , 2006, The Journal of Neuroscience.

[143]  Peter Dayan,et al.  Non-commercial Research and Educational Use including without Limitation Use in Instruction at Your Institution, Sending It to Specific Colleagues That You Know, and Providing a Copy to Your Institution's Administrator. All Other Uses, Reproduction and Distribution, including without Limitation Comm , 2022 .

[144]  E. Vaadia,et al.  Midbrain dopamine neurons encode decisions for future action , 2006, Nature Neuroscience.

[145]  David S. Touretzky,et al.  Representation and Timing in Theories of the Dopamine System , 2006, Neural Computation.

[146]  Samuel M. McClure,et al.  Policy Adjustment in a Dynamic Economic Game , 2006, PloS one.

[147]  P. Dayan,et al.  Tonic dopamine: opportunity costs and the control of response vigor , 2007, Psychopharmacology.

[148]  P. Glimcher,et al.  Statistics of midbrain dopamine neuron spike trains in the awake primate. , 2007, Journal of neurophysiology.

[149]  N. Daw,et al.  Reinforcement Learning Signals in the Human Striatum Distinguish Learners from Nonlearners during Reward-Based Decision Making , 2007, The Journal of Neuroscience.

[150]  Y. Niv THE EFFECTS OF MOTIVATION ON HABITUAL INSTRUMENTAL BEHAVIOR , 2007 .

[151]  O. Hikosaka,et al.  Lateral habenula as a source of negative reward signals in dopamine neurons , 2007, Nature.

[152]  Mary Kay Lobo,et al.  Genetic control of instrumental conditioning by striatopallidal neuron–specific S1P receptor Gpr6 , 2007, Nature Neuroscience.

[153]  Jadin C. Jackson,et al.  Reconciling reinforcement learning models with behavioral extinction and renewal: implications for addiction, relapse, and problem gambling. , 2007, Psychological review.

[154]  Dopamine responses to complex reward-predicting stimuli , 2007, Neuroscience Research.

[155]  M. Roesch,et al.  Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards , 2007, Nature Neuroscience.

[156]  Vivian V. Valentin,et al.  Determining the Neural Substrates of Goal-Directed Learning in the Human Brain , 2007, The Journal of Neuroscience.

[157]  Yasushi Kobayashi,et al.  Reward Prediction Error Computation in the Pedunculopontine Tegmental Nucleus Neurons , 2007, Annals of the New York Academy of Sciences.

[158]  Timothy E. J. Behrens,et al.  Learning the value of information in an uncertain world , 2007, Nature Neuroscience.

[159]  R. Wightman,et al.  Associative learning mediates dynamic shifts in dopamine signaling in the nucleus accumbens , 2007, Nature Neuroscience.

[160]  S. Kapur,et al.  Separate brain regions code for salience vs. valence during reward prediction in humans , 2007, Human brain mapping.

[161]  Sabrina M. Tom,et al.  The Neural Basis of Loss Aversion in Decision-Making Under Risk , 2007, Science.

[162]  S. Kapur,et al.  Temporal Difference Modeling of the Blood-Oxygen Level Dependent Response During Aversive Conditioning in Humans: Effects of Dopaminergic Modulation , 2007, Biological Psychiatry.

[163]  J. O'Doherty,et al.  Decoding the neural substrates of reward-related decision making with functional MRI , 2007, Proceedings of the National Academy of Sciences.

[164]  B. Moghaddam,et al.  Differential tonic influence of lateral habenula on prefrontal cortex and nucleus accumbens dopamine release , 2008, The European journal of neuroscience.

[165]  N. Daw,et al.  Striatal Activity Underlies Novelty-Based Choice in Humans , 2008, Neuron.

[166]  M. Botvinick Hierarchical models of behavior and prefrontal function , 2008, Trends in Cognitive Sciences.

[167]  Kae Nakamura,et al.  Reward-Dependent Modulation of Neuronal Activity in the Primate Dorsal Raphe Nucleus , 2008, The Journal of Neuroscience.

[168]  Y. Niv,et al.  Dialogues on prediction errors , 2008, Trends in Cognitive Sciences.

[169]  P. Dayan,et al.  Reinforcement learning: The Good, The Bad and The Ugly , 2008, Current Opinion in Neurobiology.

[170]  Colin Camerer,et al.  Neuroeconomics: decision making and the brain , 2008 .

[171]  M. Delgado,et al.  Representation of Subjective Value in the Striatum , 2009 .

[172]  M. Botvinick,et al.  Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective , 2009, Cognition.

[173]  THE METHOD OF PAWLOW IN ANIMAL PSYCHOLOGY , 2009 .

[174]  A. Cooper,et al.  Predictive Reward Signal of Dopamine Neurons , 2011 .