Variability in Dopamine Genes Dissociates Model-Based and Model-Free Reinforcement Learning

Considerable evidence suggests that multiple learning systems can drive behavior. Choice can proceed reflexively from previous actions and their associated outcomes, as captured by “model-free” learning algorithms, or flexibly from prospective consideration of outcomes that might occur, as captured by “model-based” learning algorithms. However, differential contributions of dopamine to these systems are poorly understood. Dopamine is widely thought to support model-free learning by modulating plasticity in striatum. Model-based learning may also be affected by these striatal effects, or by other dopaminergic effects elsewhere, notably on prefrontal working memory function. Indeed, prominent demonstrations linking striatal dopamine to putatively model-free learning did not rule out model-based effects, whereas other studies have reported dopaminergic modulation of verifiably model-based learning, but without distinguishing a prefrontal versus striatal locus. To clarify the relationships between dopamine, neural systems, and learning strategies, we combine a genetic association approach in humans with two well-studied reinforcement learning tasks: one isolating model-based from model-free behavior and the other sensitive to key aspects of striatal plasticity. Prefrontal function was indexed by a polymorphism in the COMT gene, differences of which reflect dopamine levels in the prefrontal cortex. This polymorphism has been associated with differences in prefrontal activity and working memory. Striatal function was indexed by a gene coding for DARPP-32, which is densely expressed in the striatum where it is necessary for synaptic plasticity. We found evidence for our hypothesis that variations in prefrontal dopamine relate to model-based learning, whereas variations in striatal dopamine function relate to model-free learning. SIGNIFICANCE STATEMENT Decisions can stem reflexively from their previously associated outcomes or flexibly from deliberative consideration of potential choice outcomes. Research implicates a dopamine-dependent striatal learning mechanism in the former type of choice. Although recent work has indicated that dopamine is also involved in flexible, goal-directed decision-making, it remains unclear whether it also contributes via striatum or via the dopamine-dependent working memory function of prefrontal cortex. We examined genetic indices of dopamine function in these regions and their relation to the two choice strategies. We found that striatal dopamine function related most clearly to the reflexive strategy, as previously shown, and that prefrontal dopamine related most clearly to the flexible strategy. These findings suggest that dissociable brain regions support dissociable choice strategies.

[1]  E. Thorndike “Animal Intelligence” , 1898, Nature.

[2]  E. Tolman Cognitive maps in rats and men. , 1948, Psychological review.

[3]  P. Greengard,et al.  Distribution and cellular localization of DARPP-32 mRNA in rat brain. , 1990, Brain research. Molecular brain research.

[4]  P S Goldman-Rakic,et al.  DARPP‐32, a phosphoprotein enriched in dopaminoceptive neurons bearing dopamine D1 receptors: DIstribution in the cerebral cortex of the newborn and adult rhesus monkey , 1990, The Journal of comparative neurology.

[5]  G. Skuza,et al.  Behavioural and neurochemical effects of Ro 40-7592, a new COMT inhibitor with a potential therapeutic activity in Parkinson's disease , 1990, Journal of neural transmission. Parkinson's disease and dementia section.

[6]  T. Sawaguchi,et al.  Catecholaminergic effects on neuronal activity related to a delayed response task in monkey prefrontal cortex. , 1990, Journal of neurophysiology.

[7]  P. Goldman-Rakic,et al.  D1 dopamine receptors in prefrontal cortex: involvement in working memory , 1991, Science.

[8]  P. Greengard,et al.  Immunocytochemical localization of DARPP‐32, a dopamine and cyclic‐ AMP‐regulated phosphoprotein, in the primate brain , 1992, The Journal of comparative neurology.

[9]  G. Chiara,et al.  Extracellular Concentrations of Dopamine and Metabolites in the Rat Caudate After Oral Administration of a Novel Catechol‐O‐Methyltransferase Inhibitor Ro 40–7592 , 1992, Journal of neurochemistry.

[10]  S. Funahashi,et al.  Working memory and prefrontal cortex , 1994, Neuroscience Research.

[11]  R. Weinshilboum,et al.  Human catechol-O-methyltransferase pharmacogenetics: description of a functional polymorphism and its potential application to neuropsychiatric disorders. , 1996, Pharmacogenetics.

[12]  D. Pfaff,et al.  Catechol-O-methyltransferase-deficient mice exhibit sexually dimorphic changes in catecholamine levels and behavior. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Allan I. Levey,et al.  Dopamine Axon Varicosities in the Prelimbic Division of the Rat Prefrontal Cortex Exhibit Sparse Immunoreactivity for the Dopamine Transporter , 1998, The Journal of Neuroscience.

[14]  Karl J. Friston,et al.  Generalisability, Random Effects & Population Inference , 1998, NeuroImage.

[15]  M. Huotari,et al.  No change of brain extracellular catecholamine levels after acute catechol-O-methyltransferase inhibition: a microdialysis study in anaesthetized rats. , 1998, European journal of pharmacology.

[16]  J. Sludden,et al.  Ethnic differences in catechol O-methyltransferase pharmacogenetics: frequency of the codon 108/158 low activity allele is lower in Kenyan than Caucasian or South-west Asian individuals. , 1998, Pharmacogenetics.

[17]  P. Männistö,et al.  Catechol-O-methyltransferase (COMT): biochemistry, molecular biology, pharmacology, and clinical efficacy of the new selective COMT inhibitors. , 1999, Pharmacological reviews.

[18]  Colin Camerer,et al.  Experience‐weighted Attraction Learning in Normal Form Games , 1999 .

[19]  P. Greengard,et al.  Dopamine and cAMP-Regulated Phosphoprotein 32 kDa Controls Both Striatal Long-Term Depression and Long-Term Potentiation, Opposing Forms of Synaptic Plasticity , 2000, The Journal of Neuroscience.

[20]  R. Straub,et al.  Effect of COMT Val108/158 Met genotype on frontal lobe function and risk for schizophrenia , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[21]  M. Karayiorgou,et al.  Brain catecholamine metabolism in catechol‐O‐methyltransferase (COMT)‐deficient mice , 2002, The European journal of neuroscience.

[22]  L. Cardon,et al.  Population stratification and spurious allelic association , 2003, The Lancet.

[23]  B. Lipska,et al.  Catechol O-methyltransferase mRNA expression in human and rat brain: evidence for a role in cortical neuronal function , 2003, Neuroscience.

[24]  Paul J. Harrison,et al.  Catechol-O-Methyltransferase Inhibition Improves Set-Shifting Performance and Elevates Stimulated Dopamine Release in the Rat Prefrontal Cortex , 2004, The Journal of Neuroscience.

[25]  Michael J. Frank,et al.  By Carrot or by Stick: Cognitive Reinforcement Learning in Parkinsonism , 2004, Science.

[26]  Angus C Nairn,et al.  DARPP-32: an integrator of neurotransmission. , 2004, Annual review of pharmacology and toxicology.

[27]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[28]  Angus C Nairn,et al.  Regulation of a protein phosphatase cascade allows convergent dopamine and glutamate signals to activate ERK in the striatum. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Karl J. Friston,et al.  Mixed-effects and fMRI studies , 2005, NeuroImage.

[30]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[31]  Michael J. Frank,et al.  Dynamic Dopamine Modulation in the Basal Ganglia: A Neurocomputational Account of Cognitive Deficits in Medicated and Nonmedicated Parkinsonism , 2005, Journal of Cognitive Neuroscience.

[32]  B. Balleine,et al.  The role of the dorsomedial striatum in instrumental conditioning , 2005, The European journal of neuroscience.

[33]  P. Glimcher,et al.  JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR 2005, 84, 555–579 NUMBER 3(NOVEMBER) DYNAMIC RESPONSE-BY-RESPONSE MODELS OF MATCHING BEHAVIOR IN RHESUS MONKEYS , 2022 .

[34]  Jeanette Kotaleski,et al.  Transient Calcium and Dopamine Increase PKA Activity and DARPP-32 Phosphorylation , 2006, PLoS Comput. Biol..

[35]  Michael J. Frank,et al.  A mechanistic account of striatal dopamine function in human cognition: psychopharmacological studies with cabergoline and haloperidol. , 2006, Behavioral neuroscience.

[36]  Andreas Meyer-Lindenberg,et al.  Genetic evidence implicating DARPP-32 in human frontostriatal structure, function, and cognition. , 2007, The Journal of clinical investigation.

[37]  Michael J. Frank,et al.  Genetic triple dissociation reveals multiple roles for dopamine in reinforcement learning , 2007, Proceedings of the National Academy of Sciences.

[38]  B. Kolachana,et al.  COMT genotype predicts cortical-limbic D1 receptor availability measured with [11C]NNC112 and PET , 2008, Molecular Psychiatry.

[39]  J. Seamans,et al.  Tolcapone enhances food-evoked dopamine efflux and executive memory processes mediated by the rat prefrontal cortex , 2008, Psychopharmacology.

[40]  D. Durstewitz,et al.  The Dual-State Theory of Prefrontal Cortex Dopamine Function with Relevance to Catechol-O-Methyltransferase Genotypes and Schizophrenia , 2008, Biological Psychiatry.

[41]  Paul Greengard,et al.  A phosphatase cascade by which rewarding stimuli control nucleosomal response , 2008, Nature.

[42]  Paul Greengard,et al.  Cell type–specific regulation of DARPP-32 phosphorylation by psychostimulant and antipsychotic drugs , 2008, Nature Neuroscience.

[43]  Karl J. Friston,et al.  Bayesian model selection for group studies , 2009, NeuroImage.

[44]  K. Doya,et al.  Validation of Decision-Making Models and Analysis of Decision Variables in the Rat Basal Ganglia , 2009, The Journal of Neuroscience.

[45]  Peter N. C. Mohr,et al.  Genetic variation in dopaminergic neuromodulation influences the ability to rapidly and flexibly adapt decisions , 2009, Proceedings of the National Academy of Sciences.

[46]  M. Frank,et al.  Prefrontal and striatal dopaminergic genes predict individual differences in exploration and exploitation. , 2009, Nature neuroscience.

[47]  Lars Bäckman,et al.  Influence of COMT Gene Polymorphism on fMRI-assessed Sustained and Transient Activity during a Working Memory Task , 2010, Journal of Cognitive Neuroscience.

[48]  P. Dayan,et al.  States versus Rewards: Dissociable Neural Prediction Error Signals Underlying Model-Based and Model-Free Reinforcement Learning , 2010, Neuron.

[49]  Pardis C Sabeti,et al.  Population genetic study of the brain-derived neurotrophic factor (BDNF) gene , 2009, Molecular Psychiatry.

[50]  M. Ullsperger,et al.  Dopamine-Mediated Reinforcement Learning Signals in the Striatum and Ventromedial Prefrontal Cortex Underlie Value-Based Choices , 2011, The Journal of Neuroscience.

[51]  P. Glimcher Understanding dopamine and reinforcement learning: The dopamine reward prediction error hypothesis , 2011, Proceedings of the National Academy of Sciences.

[52]  Dylan A. Simon,et al.  Neural Correlates of Forward Planning in a Spatial Decision Task in Humans , 2011, The Journal of Neuroscience.

[53]  P. Dayan,et al.  Model-based influences on humans’ choices and striatal prediction errors , 2011, Neuron.

[54]  M. Frank,et al.  Dopaminergic Genes Predict Individual Differences in Susceptibility to Confirmation Bias , 2011, The Journal of Neuroscience.

[55]  N. Daw,et al.  The ubiquity of model-based reinforcement learning , 2012, Current Opinion in Neurobiology.

[56]  Anatol C. Kreitzer,et al.  Distinct roles for direct and indirect pathway striatal neurons in reinforcement , 2012, Nature Neuroscience.

[57]  R. Dolan,et al.  Dopamine Enhances Model-Based over Model-Free Choice Behavior , 2012, Neuron.

[58]  Anne G E Collins,et al.  How much of reinforcement learning is working memory, not reinforcement learning? A behavioral, computational, and neurogenetic analysis , 2012, The European journal of neuroscience.

[59]  Alice Y. Chiang,et al.  Working-memory capacity protects model-based learning from stress , 2013, Proceedings of the National Academy of Sciences.

[60]  D. Barr,et al.  Random effects structure for confirmatory hypothesis testing: Keep it maximal. , 2013, Journal of memory and language.

[61]  P. Dayan,et al.  Goals and Habits in the Brain , 2013, Neuron.

[62]  Josiah R. Boivin,et al.  A Causal Link Between Prediction Errors, Dopamine Neurons and Learning , 2013, Nature Neuroscience.

[63]  Thomas H. B. FitzGerald,et al.  Disruption of Dorsolateral Prefrontal Cortex Decreases Model-Based in Favor of Model-free Control in Humans , 2013, Neuron.

[64]  Anne G E Collins,et al.  Opponent actor learning (OpAL): modeling interactive effects of striatal dopamine on reinforcement learning and choice incentive. , 2014, Psychological review.

[65]  K. Humphreys,et al.  Race moderates the association of Catechol-O-methyltransferase genotype and posttraumatic stress disorder in preschool children. , 2014, Journal of child and adolescent psychopharmacology.

[66]  Anne G E Collins,et al.  A Reinforcement Learning Mechanism Responsible for the Valuation of Free Choice , 2014, Neuron.

[67]  Dylan A. Simon,et al.  Model-based choices involve prospective neural activity , 2015, Nature Neuroscience.

[68]  N. Daw,et al.  Model-based learning protects against forming habits , 2015, Cognitive, Affective, & Behavioral Neuroscience.

[69]  Sylvia M. L. Cox,et al.  Striatal D1 and D2 signaling differentially predict learning from positive and negative outcomes , 2015, NeuroImage.

[70]  P. Dayan,et al.  Disorders of compulsivity: a common bias towards learning habits , 2014, Molecular Psychiatry.

[71]  K. Rottner,et al.  How distinct Arp2/3 complex variants regulate actin filament assembly , 2015, Nature Cell Biology.

[72]  R. Dolan,et al.  Ventral striatal dopamine reflects behavioral and neural signatures of model-based control during sequential decision making , 2015, Proceedings of the National Academy of Sciences.

[73]  N. Daw,et al.  Multiple memory systems as substrates for multiple decision systems , 2015, Neurobiology of Learning and Memory.

[74]  N. Daw,et al.  Dopamine selectively remediates 'model-based' reward learning: a computational approach. , 2016, Brain : a journal of neurology.