Dopaminergic Control of Motivation and Reinforcement Learning: A Closed-Circuit Account for Reward-Oriented Behavior

Humans and animals take actions quickly when they expect that the actions lead to reward, reflecting their motivation. Injection of dopamine receptor antagonists into the striatum has been shown to slow such reward-seeking behavior, suggesting that dopamine is involved in the control of motivational processes. Meanwhile, neurophysiological studies have revealed that phasic response of dopamine neurons appears to represent reward prediction error, indicating that dopamine plays central roles in reinforcement learning. However, previous attempts to elucidate the mechanisms of these dopaminergic controls have not fully explained how the motivational and learning aspects are related and whether they can be understood by the way the activity of dopamine neurons itself is controlled by their upstream circuitries. To address this issue, we constructed a closed-circuit model of the corticobasal ganglia system based on recent findings regarding intracortical and corticostriatal circuit architectures. Simulations show that the model could reproduce the observed distinct motivational effects of D1- and D2-type dopamine receptor antagonists. Simultaneously, our model successfully explains the dopaminergic representation of reward prediction error as observed in behaving animals during learning tasks and could also explain distinct choice biases induced by optogenetic stimulation of the D1 and D2 receptor-expressing striatal neurons. These results indicate that the suggested roles of dopamine in motivational control and reinforcement learning can be understood in a unified manner through a notion that the indirect pathway of the basal ganglia represents the value of states/actions at a previous time point, an empirically driven key assumption of our model.

[1]  Yasuo Kawaguchi,et al.  Multiple layer 5 pyramidal cell subtypes relay cortical feedback from secondary to primary motor areas in rats. , 2014, Cerebral cortex.

[2]  Michael Small,et al.  Control of layer 5 pyramidal cell spiking by oscillatory inhibition in the distal apical dendrites: a computational modeling study. , 2013, Journal of neurophysiology.

[3]  Romain Bourdy,et al.  A new control center for dopaminergic systems: pulling the VTA by the tail , 2012, Trends in Neurosciences.

[4]  K. Deisseroth,et al.  Input-specific control of reward and aversion in the ventral tegmental area , 2012, Nature.

[5]  K. Sakai,et al.  Reinforcement learning: computing the temporal difference of values via distinct corticostriatal pathways , 2012, Trends in Neurosciences.

[6]  K. Deisseroth,et al.  Striatal Dopamine Release Is Triggered by Synchronized Activity in Cholinergic Interneurons , 2012, Neuron.

[7]  Anatol C. Kreitzer,et al.  A Comparison of Striatal-Dependent Behaviors in Wild-Type and Hemizygous Drd1a and Drd2 BAC Transgenic Mice , 2012, The Journal of Neuroscience.

[8]  Joshua L. Plotkin,et al.  Strain-Specific Regulation of Striatal Phenotype in Drd2-eGFP BAC Transgenic Mice , 2012, The Journal of Neuroscience.

[9]  Sachie K. Ogawa,et al.  Whole-Brain Mapping of Direct Inputs to Midbrain Dopamine Neurons , 2012, Neuron.

[10]  Joseph J. Paton,et al.  Reward and punishment illuminated , 2012, Nature Neuroscience.

[11]  Christophe D. Proulx,et al.  Input to the Lateral Habenula from the Basal Ganglia Is Excitatory, Aversive, and Suppressed by Serotonin , 2012, Neuron.

[12]  Anatol C. Kreitzer,et al.  Distinct roles for direct and indirect pathway striatal neurons in reinforcement , 2012, Nature Neuroscience.

[13]  Xuefeng Shen,et al.  Stimulation of Midbrain Dopaminergic Structures Modifies Firing Rates of Rat Lateral Habenula Neurons , 2012, PloS one.

[14]  J. Wickens,et al.  Neural control of dopamine neurotransmission: implications for reinforcement learning , 2012, The European journal of neuroscience.

[15]  F. Karube,et al.  Specialized Cortical Subnetworks Differentially Connect Frontal Cortex to Parahippocampal Areas , 2012, The Journal of Neuroscience.

[16]  Anne E Carpenter,et al.  Neuron-type specific signals for reward and punishment in the ventral tegmental area , 2011, Nature.

[17]  J. Bargas,et al.  Dopaminergic modulation of striatal neurons, circuits, and assemblies , 2011, Neuroscience.

[18]  Taro Kiritani,et al.  Corticospinal-specific HCN expression in mouse motor cortex: I(h)-dependent synaptic integration as a candidate microcircuit mechanism involved in motor control. , 2011, Journal of neurophysiology.

[19]  G. Buzsáki,et al.  A 4 Hz Oscillation Adaptively Synchronizes Prefrontal, VTA, and Hippocampal Activities , 2011, Neuron.

[20]  T. Robbins,et al.  The hippocampal–striatal axis in learning, prediction and goal-directed behavior , 2011, Trends in Neurosciences.

[21]  Robert C. Wilson,et al.  Expectancy-related changes in firing of dopamine neurons depend on orbitofrontal cortex , 2011, Nature Neuroscience.

[22]  P. Calabresi,et al.  Dopamine-Dependent Long-Term Depression Is Expressed in Striatal Spiny Neurons of Both Direct and Indirect Pathways: Implications for Parkinson's Disease , 2011, The Journal of Neuroscience.

[23]  Simon Hong,et al.  Negative Reward Signals from the Lateral Habenula to Dopamine Neurons Are Mediated by Rostromedial Tegmental Nucleus in Primates , 2011, The Journal of Neuroscience.

[24]  Wolfgang Maass,et al.  Branch-Specific Plasticity Enables Self-Organization of Nonlinear Computation in Single Neurons , 2011, The Journal of Neuroscience.

[25]  Y. Kubota,et al.  Highly Differentiated Projection-Specific Cortical Subnetworks , 2011, The Journal of Neuroscience.

[26]  A. Graybiel,et al.  Basal Ganglia Disorders Associated with Imbalances in the Striatal Striosome and Matrix Compartments , 2011, Front. Neuroanat..

[27]  Timothy E. J. Behrens,et al.  Review Frontal Cortex and Reward-guided Learning and Decision-making Figure 1. Frontal Brain Regions in the Macaque Involved in Reward-guided Learning and Decision-making Finer Grained Anatomical Divisions with Frontal Cortical Systems for Reward-guided Behavior , 2022 .

[28]  C. Gerfen,et al.  Modulation of striatal projection systems by dopamine. , 2011, Annual review of neuroscience.

[29]  Markus Diesmann,et al.  An Imperfect Dopaminergic Error Signal Can Drive Temporal-Difference Learning , 2011, PLoS Comput. Biol..

[30]  Simon Hong,et al.  Dopamine-Mediated Learning and Switching in Cortico-Striatal Circuit Explain Behavioral Changes in Reinforcement Learning , 2011, Front. Behav. Neurosci..

[31]  P. Glimcher Understanding dopamine and reinforcement learning: The dopamine reward prediction error hypothesis , 2011, Proceedings of the National Academy of Sciences.

[32]  F. Fujiyama,et al.  Exclusive and common targets of neostriatofugal projections of rat striosome neurons: a single neuron‐tracing study using a viral vector , 2011, The European journal of neuroscience.

[33]  Kenji F. Tanaka,et al.  Functional Connectome of the Striatal Medium Spiny Neuron , 2011, The Journal of Neuroscience.

[34]  D. Sibley,et al.  Dopamine D2 Receptor Overexpression Alters Behavior and Physiology in Drd2-EGFP Mice , 2011, Journal of Neuroscience.

[35]  榎本 一紀 Dopamine neurons learn to encode the long-term value of multiple future rewards , 2011 .

[36]  Ethan S. Bromberg-Martin,et al.  Dopamine in Motivational Control: Rewarding, Aversive, and Alerting , 2010, Neuron.

[37]  Kenji Matsumoto,et al.  Neural basis of the undermining effect of monetary reward on intrinsic motivation , 2010, Proceedings of the National Academy of Sciences.

[38]  A. Reiner,et al.  Corticostriatal Projection Neurons – Dichotomous Types and Dichotomous Functions , 2010, Front. Neuroanat..

[39]  Bernardo L. Sabatini,et al.  Competitive regulation of synaptic Ca influx by D2 dopamine and A2A adenosine receptors , 2010, Nature Neuroscience.

[40]  Anatol C. Kreitzer,et al.  Distinct Roles of GABAergic Interneurons in the Regulation of Striatal Output Pathways , 2010, The Journal of Neuroscience.

[41]  Junichiro Yoshimoto,et al.  A Kinetic Model of Dopamine- and Calcium-Dependent Striatal Synaptic Plasticity , 2010, PLoS Comput. Biol..

[42]  Y. Smith,et al.  Ultrastructural relationships between cortical, thalamic, and amygdala glutamatergic inputs and group I metabotropic glutamate receptors in the rat accumbens , 2009, The Journal of comparative neurology.

[43]  Bartlett W. Mel,et al.  Encoding and Decoding Bursts by NMDA Spikes in Basal Dendrites of Layer 5 Pyramidal Neurons , 2009, The Journal of Neuroscience.

[44]  E. Kuramoto,et al.  Two types of thalamocortical projections from the motor thalamic nuclei of the rat: a single neuron-tracing study using viral vectors. , 2009, Cerebral cortex.

[45]  Masahiko Watanabe,et al.  Metabotropic glutamate type 5, dopamine D2 and adenosine A2a receptors form higher‐order oligomers in living cells , 2009, Journal of neurochemistry.

[46]  Shimon Whiteson,et al.  A theoretical and empirical analysis of Expected Sarsa , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.

[47]  T. Jhou,et al.  The mesopontine rostromedial tegmental nucleus: A structure targeted by the lateral habenula that projects to the ventral tegmental area of Tsai and substantia nigra compacta , 2009, The Journal of comparative neurology.

[48]  Yasushi Kobayashi,et al.  Different Pedunculopontine Tegmental Neurons Signal Predicted and Actual Task Rewards , 2009, The Journal of Neuroscience.

[49]  Mark G. Baxter,et al.  The Rostromedial Tegmental Nucleus (RMTg), a GABAergic Afferent to Midbrain Dopamine Neurons, Encodes Aversive Stimuli and Inhibits Motor Responses , 2009, Neuron.

[50]  S. Schiffmann,et al.  Dopamine D2 and Adenosine A2A Receptors Regulate NMDA-Mediated Excitation in Accumbens Neurons Through A2A–D2 Receptor Heteromerization , 2009, Neuropsychopharmacology.

[51]  Kenji Morita Computational Implications of Cooperative Plasticity Induction at Nearby Dendritic Sites , 2009, Science Signaling.

[52]  O. Hikosaka,et al.  Representation of negative motivational value in the primate lateral habenula , 2009, Nature Neuroscience.

[53]  Simon Hong,et al.  The Globus Pallidus Sends Reward-Related Signals to the Lateral Habenula , 2008, Neuron.

[54]  P. Greengard,et al.  Dichotomous Dopaminergic Control of Striatal Synaptic Plasticity , 2008, Science.

[55]  P. Glimcher,et al.  Value Representations in the Primate Striatum during Matching Behavior , 2008, Neuron.

[56]  M. Reuter,et al.  Genetically Determined Differences in Learning from Errors , 2007, Science.

[57]  M. Roesch,et al.  Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards , 2007, Nature Neuroscience.

[58]  G. Fisone,et al.  Adenosine A2A receptors and basal ganglia physiology , 2007, Progress in Neurobiology.

[59]  P. Brown Abnormal oscillatory synchronisation in the motor system leads to impaired movement , 2007, Current Opinion in Neurobiology.

[60]  S. Baker Oscillatory interactions between sensorimotor cortex and the periphery , 2007, Current Opinion in Neurobiology.

[61]  John A Wolf,et al.  Effects of dopaminergic modulation on the integrative properties of the ventral striatal medium spiny neuron. , 2007, Journal of neurophysiology.

[62]  O. Hikosaka,et al.  Lateral habenula as a source of negative reward signals in dopamine neurons , 2007, Nature.

[63]  Y. Niv Cost, Benefit, Tonic, Phasic , 2007, Annals of the New York Academy of Sciences.

[64]  H. Markram,et al.  Disynaptic Inhibition between Neocortical Pyramidal Cells Mediated by Martinotti Cells , 2007, Neuron.

[65]  J. Tepper,et al.  GABAergic control of substantia nigra dopaminergic neurons. , 2007, Progress in brain research.

[66]  P. Dayan,et al.  Tonic dopamine: opportunity costs and the control of response vigor , 2007, Psychopharmacology.

[67]  E. Vaadia,et al.  Midbrain dopamine neurons encode decisions for future action , 2006, Nature Neuroscience.

[68]  Kae Nakamura,et al.  Role of Dopamine in the Primate Caudate Nucleus in Reward Modulation of Saccades , 2006, The Journal of Neuroscience.

[69]  Y. Kawaguchi,et al.  Recurrent Connection Patterns of Corticostriatal Pyramidal Cells in Frontal Cortex , 2006, The Journal of Neuroscience.

[70]  Kae Nakamura,et al.  Basal ganglia orient eyes to reward. , 2006, Journal of neurophysiology.

[71]  A. Graybiel The basal ganglia: learning new tricks and loving it , 2005, Current Opinion in Neurobiology.

[72]  K. Doya,et al.  Representation of Action-Specific Reward Values in the Striatum , 2005, Science.

[73]  O. Hikosaka,et al.  Immediate changes in anticipatory activity of caudate neurons associated with reversal of position-reward contingency. , 2005, Journal of neurophysiology.

[74]  P. Glimcher,et al.  Midbrain Dopamine Neurons Encode a Quantitative Reward Prediction Error Signal , 2005, Neuron.

[75]  Aaron S. Andalman,et al.  Vocal Experimentation in the Juvenile Songbird Requires a Basal Ganglia Circuit , 2005, PLoS biology.

[76]  Michael J. Frank,et al.  Dynamic Dopamine Modulation in the Basal Ganglia: A Neurocomputational Account of Cognitive Deficits in Medicated and Nonmedicated Parkinsonism , 2005, Journal of Cognitive Neuroscience.

[77]  Michael J. Frank,et al.  By Carrot or by Stick: Cognitive Reinforcement Learning in Parkinsonism , 2004, Science.

[78]  J. Paul Bolam,et al.  Pedunculopontine nucleus and basal ganglia: distant relatives or part of the same family? , 2004, Trends in Neurosciences.

[79]  O. Hikosaka,et al.  A possible role of midbrain dopamine neurons in short- and long-term adaptation of saccades to position-reward mapping. , 2004, Journal of neurophysiology.

[80]  A. Reiner,et al.  Evidence for Differential Cortical Input to Direct Pathway versus Indirect Pathway Striatal Projection Neurons in Rats , 2004, The Journal of Neuroscience.

[81]  Saori C. Tanaka,et al.  Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops , 2004, Nature Neuroscience.

[82]  C. Cepeda,et al.  Modulation of AMPA currents by D2 dopamine receptors in striatal medium‐sized spiny neurons: are dendrites necessary? , 2004, The European journal of neuroscience.

[83]  Karl J. Friston,et al.  Dissociable Roles of Ventral and Dorsal Striatum in Instrumental Conditioning , 2004, Science.

[84]  O. Hikosaka,et al.  Reward-predicting activity of dopamine and caudate neurons--a possible mechanism of motivational control of saccadic eye movement. , 2004, Journal of neurophysiology.

[85]  John H. Martin,et al.  The Transition from Development to Motor Control Function in the Corticospinal System , 2004, The Journal of Neuroscience.

[86]  Samuel M. McClure,et al.  A computational substrate for incentive salience , 2003, Trends in Neurosciences.

[87]  Xiao-Jing Wang,et al.  Probabilistic Decision Making by Slow Reverberation in Cortical Circuits , 2002, Neuron.

[88]  J. Salamone,et al.  Motivational views of reinforcement: implications for understanding the behavioral functions of nucleus accumbens dopamine , 2002, Behavioural Brain Research.

[89]  Paul Greengard,et al.  Dopamine enhancement of NMDA currents in dissociated medium-sized striatal neurons: role of D1 receptors and DARPP-32. , 2002, Journal of neurophysiology.

[90]  P. Dayan,et al.  Reward, Motivation, and Reinforcement Learning , 2002, Neuron.

[91]  Nace L. Golding,et al.  Dendritic spikes as a mechanism for cooperative long-term potentiation , 2002, Nature.

[92]  Peter Dayan,et al.  Dopamine: generalization and bonuses , 2002, Neural Networks.

[93]  J. Wickens,et al.  A cellular mechanism of reward-related learning , 2001, Nature.

[94]  Yitzhak Schiller,et al.  NMDA receptor-mediated dendritic spikes and coincident signal amplification , 2001, Current Opinion in Neurobiology.

[95]  Bartlett W. Mel,et al.  Impact of Active Dendrites and Structural Plasticity on the Memory Capacity of Neural Tissue , 2001, Neuron.

[96]  R. Cunha,et al.  Adenosine as a neuromodulator and as a homeostatic regulator in the nervous system: different roles, different sources and different receptors , 2001, Neurochemistry International.

[97]  B. Sakmann,et al.  A new cellular mechanism for coupling inputs arriving at different cortical layers , 1999, Nature.

[98]  W. Schultz,et al.  Relative reward preference in primate orbitofrontal cortex , 1999, Nature.

[99]  J. Tepper,et al.  Striatal, pallidal, and pars reticulata evoked inhibition of nigrostriatal dopaminergic neurons is mediated by GABAA receptors in vivo , 1999, Neuroscience.

[100]  K. Berridge,et al.  What is the role of dopamine in reward: hedonic impact, reward learning, or incentive salience? , 1998, Brain Research Reviews.

[101]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[102]  A M Graybiel,et al.  Cortically Driven Immediate-Early Gene Expression Reflects Modular Influence of Sensorimotor Cortex on Identified Striatal Neurons in the Squirrel Monkey , 1997, The Journal of Neuroscience.

[103]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[104]  H. C. Cromwell,et al.  Neuromodulatory actions of dopamine on synaptically‐evoked neostriatal responses in slices , 1996, Synapse.

[105]  H. C. Cromwell,et al.  Neuromodulatory actions of dopamine on synaptically‐evoked neostriatal responses in slices , 1996, Synapse.

[106]  T. Robbins,et al.  Neurobehavioural mechanisms of reward and motivation , 1996, Current Opinion in Neurobiology.

[107]  P. Dayan,et al.  A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[108]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[109]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[110]  F. H. Lopes da Silva,et al.  Synaptic Plasticity in an In Vitro Slice Preparation of the Rat Nucleus Accumbens , 1993, The European journal of neuroscience.

[111]  M. Delong,et al.  Primate models of movement disorders of basal ganglia origin , 1990, Trends in Neurosciences.

[112]  J. Penney,et al.  The functional anatomy of basal ganglia disorders , 1989, Trends in Neurosciences.

[113]  C. Gerfen,et al.  The neostriatal mosaic: compartmental distribution of calcium-binding protein and parvalbumin in the basal ganglia of the rat and monkey. , 1985, Proceedings of the National Academy of Sciences of the United States of America.