Dorsal anterior cingulate-midbrain ensemble as a reinforcement meta-learner

The dorsal anterior cingulate cortex (dACC) is central in higher-order cognition and behavioural flexibility. Reinforcement Learning, Bayesian decision-making, and cognitive control are currently the three main theoretical frameworks within which the elusive computational nature of this brain area is chased after–but with overall limited success. Here we propose a new model–the Reinforcement Meta Learner (RML)–in which we exploit core insights into the anatomical connections of the ACC with two midbrain catecholamine nuclei (VTA and LC). With its dual role of selecting and implementing optimal decisions (via VTA) and learning to control its own learning parameters (via LC), the RML generates an autonomous control system with the ability of learning to solve hierarchical decision problems without having an intrinsic hierarchical structure itself. We discuss how our model accounts for an unprecedented number of empirical findings across various cognitive domains, assimilates various previously proposed ACC computations while respecting biological constraints, and provides theoretical integration at various levels. The theoretical pillars of our work promise a generic template (i.e., recurrent connectivity between cortex and midbrain) with which meta-cognition can be computationally approached.

[1]  W. Schultz,et al.  Responses of monkey dopamine neurons during learning of behavioral reactions. , 1992, Journal of neurophysiology.

[2]  Massimo Silvetti,et al.  Value and Prediction Error in Medial Frontal Cortex: Integrating the Single-Unit and Systems Levels of Analysis , 2011, Front. Hum. Neurosci..

[3]  M. Botvinick,et al.  The intrinsic cost of cognitive control. , 2013, The Behavioral and brain sciences.

[4]  O. Hikosaka,et al.  A possible role of midbrain dopamine neurons in short- and long-term adaptation of saccades to position-reward mapping. , 2004, Journal of neurophysiology.

[5]  Tom Verguts,et al.  Binding by Random Bursts: A Computational Model of Cognitive Control , 2017, Journal of Cognitive Neuroscience.

[6]  P. Dayan,et al.  A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[7]  B. Vogt,et al.  Contributions of anterior cingulate cortex to behaviour. , 1995, Brain : a journal of neurology.

[8]  D. Kahneman,et al.  Attention and Effort , 1973 .

[9]  Joshua W. Brown,et al.  How the Basal Ganglia Use Parallel Excitatory and Inhibitory Learning Pathways to Selectively Respond to Unexpected Rewarding Cues , 1999, The Journal of Neuroscience.

[10]  W. Schultz,et al.  Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task , 1993, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[11]  Graham V. Williams,et al.  Inverted-U dopamine D1 receptor actions on prefrontal neurons engaged in working memory , 2007, Nature Neuroscience.

[12]  Massimo Silvetti,et al.  Adaptive effort investment in cognitive and physical tasks: a neurocomputational model , 2015, Front. Behav. Neurosci..

[13]  Daniel S. Margulies,et al.  Mapping the functional connectivity of anterior cingulate cortex , 2007, NeuroImage.

[14]  S. Eickhoff,et al.  Sustaining attention to simple tasks: a meta-analytic review of the neural mechanisms of vigilant attention. , 2013, Psychological bulletin.

[15]  Thomas E. Hazy,et al.  PVLV: the primary value and learned value Pavlovian learning algorithm. , 2007, Behavioral neuroscience.

[16]  Carl D. Cheney,et al.  Behavior Analysis and Learning , 1998 .

[17]  Clay B. Holroyd,et al.  Computational Models of Anterior Cingulate Cortex: At the Crossroads between Prediction and Effort , 2017, Front. Neurosci..

[18]  Angela L. Duckworth,et al.  An opportunity cost model of subjective effort and task performance. , 2013, The Behavioral and brain sciences.

[19]  William H. Alexander,et al.  Predicting motivation: computational models of PFC can explain neural coding of motivation and effort-based decision-making in health and disease , 2017, bioRxiv.

[20]  D. Buonomano,et al.  The neural basis of temporal processing. , 2004, Annual review of neuroscience.

[21]  Elliot A. Ludvig,et al.  Evaluating the TD model of classical conditioning , 2012, Learning & behavior.

[22]  J. Salamone,et al.  Anhedonia or anergia? Effects of haloperidol and nucleus accumbens dopamine depletion on instrumental response selection in a T-maze cost/benefit procedure , 1994, Behavioural Brain Research.

[23]  Joshua I. Gold,et al.  A Mixture of Delta-Rules Approximation to Bayesian Inference in Change-Point Problems , 2013, PLoS Comput. Biol..

[24]  Jonathan D. Cohen,et al.  An integrative theory of locus coeruleus-norepinephrine function: adaptive gain and optimal performance. , 2005, Annual review of neuroscience.

[25]  B. Hayden,et al.  Dorsal anterior cingulate: a Rorschach test for cognitive neuroscience , 2016, Nature Neuroscience.

[26]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[27]  J. Gold,et al.  Relationships between Pupil Diameter and Neuronal Activity in the Locus Coeruleus, Colliculi, and Cingulate Cortex , 2016, Neuron.

[28]  F. Gregory Ashby,et al.  FROST: A Distributed Neurocomputational Model of Working Memory Maintenance , 2005, Journal of Cognitive Neuroscience.

[29]  Mark A. Straccia,et al.  Anterior Cingulate Engagement in a Foraging Context Reflects Choice Difficulty, Not Foraging Value , 2014, Nature Neuroscience.

[30]  P. Dayan,et al.  Tonic dopamine: opportunity costs and the control of response vigor , 2007, Psychopharmacology.

[31]  M. Walton,et al.  Comparing the role of the anterior cingulate cortex and 6‐hydroxydopamine nucleus accumbens lesions on operant effort‐based decision making , 2009, The European journal of neuroscience.

[32]  William H. Alexander,et al.  Hierarchical Error Representation: A Computational Model of Anterior Cingulate and Dorsolateral Prefrontal Cortex , 2015, Neural Computation.

[33]  Karl J. Friston,et al.  A Bayesian Foundation for Individual Learning Under Uncertainty , 2011, Front. Hum. Neurosci..

[34]  Angela J. Yu,et al.  Adaptive Behavior: Humans Act as Bayesian Learners , 2007, Current Biology.

[35]  Kenji Doya,et al.  Metalearning and neuromodulation , 2002, Neural Networks.

[36]  W. Fias,et al.  Overlapping Neural Systems Represent Cognitive Effort and Reward Anticipation , 2014, PloS one.

[37]  Timothy E. J. Behrens,et al.  Double dissociation of value computations in orbitofrontal and anterior cingulate neurons , 2011, Nature Neuroscience.

[38]  Brad E. Pfeiffer,et al.  Hippocampal place cell sequences depict future paths to remembered goals , 2013, Nature.

[39]  Jonathan D. Cohen,et al.  Dorsal anterior cingulate cortex and the value of control , 2016, Nature Neuroscience.

[40]  Wim Fias,et al.  Correlation between individual differences in striatal dopamine and in visual consciousness , 2014, Current Biology.

[41]  Matthew R. Nassar,et al.  Catecholaminergic Regulation of Learning Rate in a Dynamic Environment , 2016, PLoS Comput. Biol..

[42]  R. Joosten,et al.  Reward-Predictive Cues Enhance Excitatory Synaptic Strength onto Midbrain Dopamine Neurons , 2008, Science.

[43]  Robert C. Wilson,et al.  Rational regulation of learning dynamics by pupil–linked arousal systems , 2012, Nature Neuroscience.

[44]  Timothy Edward John Behrens,et al.  Effort-Based Cost–Benefit Valuation and the Human Brain , 2009, The Journal of Neuroscience.

[45]  Anne E Carpenter,et al.  Neuron-type specific signals for reward and punishment in the ventral tegmental area , 2011, Nature.

[46]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[47]  P. Goldman-Rakic,et al.  Selective D2 Receptor Actions on the Functional Circuitry of Working Memory , 2004, Science.

[48]  Stanley C. Ratner,et al.  Comparative psychology : research in animal behavior , 1964 .

[49]  Timothy E. J. Behrens,et al.  Learning the value of information in an uncertain world , 2007, Nature Neuroscience.

[50]  John R. Anderson,et al.  Using model-based functional MRI to locate working memory updates and declarative memory retrievals in the fronto-parietal network , 2013, Proceedings of the National Academy of Sciences.

[51]  Joseph T. McGuire,et al.  Decision making and the avoidance of cognitive demand. , 2010, Journal of experimental psychology. General.

[52]  Sander Nieuwenhuis,et al.  Pupil Diameter Predicts Changes in the Exploration–Exploitation Trade-off: Evidence for the Adaptive Gain Theory , 2011, Journal of Cognitive Neuroscience.

[53]  N. Ramnani,et al.  The Anterior Cingulate Gyrus Signals the Net Value of Others' Rewards , 2014, The Journal of Neuroscience.

[54]  P. Dayan,et al.  Dopamine, learning, and impulsivity: a biological account of attention-deficit/hyperactivity disorder. , 2005, Journal of child and adolescent psychopharmacology.

[55]  Giovanni Pezzulo,et al.  The Mixed Instrumental Controller: Using Value of Information to Combine Habitual Choice and Mental Simulation , 2013, Front. Psychol..

[56]  Bao-Ming Li,et al.  Delayed-response deficit induced by local injection of the alpha 2-adrenergic antagonist yohimbine into the dorsolateral prefrontal cortex in young adult monkeys. , 1994, Behavioral and neural biology.

[57]  Massimo Silvetti,et al.  Value and prediction error estimation account for volatility effects in ACC: A model-based fMRI study , 2013, Cortex.

[58]  Joshua W. Brown,et al.  From conflict management to reward-based decision making: Actors and critics in primate medial frontal cortex , 2014, Neuroscience & Biobehavioral Reviews.

[59]  J. Wagemans,et al.  Precise minds in uncertain worlds: predictive coding in autism. , 2014, Psychological review.

[60]  Ann M Graybiel,et al.  Neural representation of time in cortico-basal ganglia circuits , 2009, Proceedings of the National Academy of Sciences.

[61]  Angela J. Yu,et al.  Uncertainty, Neuromodulation, and Attention , 2005, Neuron.

[62]  Peter Ford Dominey,et al.  Robot Cognitive Control with a Neurophysiologically Inspired Reinforcement Learning Model , 2011, Front. Neurorobot..

[63]  P. Dayan,et al.  Dopamine and performance in a reinforcement learning task: evidence from Parkinson's disease. , 2012 .

[64]  D. McCormick,et al.  α2A-Adrenoceptors Strengthen Working Memory Networks by Inhibiting cAMP-HCN Channel Signaling in Prefrontal Cortex , 2007, Cell.

[65]  Peter Ford Dominey,et al.  Behavioral Regulation and the Modulation of Information Coding in the Lateral Prefrontal and Cingulate Cortex. , 2015, Cerebral cortex.

[66]  Bao-Ming Li,et al.  Alpha-2 Adrenergic Modulation of Prefrontal Cortical Neuronal Activity Related to Spatial Working Memory in Monkeys , 1999, Neuropsychopharmacology.

[67]  Samuel M. McClure,et al.  Hierarchical control over effortful behavior by rodent medial frontal cortex: A computational model. , 2015, Psychological review.

[68]  Kenji Doya,et al.  Meta-learning in Reinforcement Learning , 2003, Neural Networks.

[69]  Geraint Rees,et al.  Encoding of Temporal Probabilities in the Human Brain , 2010, The Journal of Neuroscience.

[70]  S. Sara The locus coeruleus and noradrenergic modulation of cognition , 2009, Nature Reviews Neuroscience.

[71]  W. Pan,et al.  Dopamine Cells Respond to Predicted Events during Classical Conditioning: Evidence for Eligibility Traces in the Reward-Learning Network , 2005, The Journal of Neuroscience.

[72]  Michael J. Frank,et al.  By Carrot or by Stick: Cognitive Reinforcement Learning in Parkinsonism , 2004, Science.

[73]  Timothy Edward John Behrens,et al.  Value, search, persistence and model updating in anterior cingulate cortex , 2016, Nature Neuroscience.

[74]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[75]  R. Dolan,et al.  Computational Psychiatry of ADHD: Neural Gain Impairments across Marrian Levels of Analysis , 2016, Trends in Neurosciences.

[76]  Ruth Seurinck,et al.  The influence of the noradrenergic system on optimal control of neural plasticity , 2013, Front. Behav. Neurosci..

[77]  P. Dayan,et al.  Opinion TRENDS in Cognitive Sciences Vol.10 No.8 Full text provided by www.sciencedirect.com A normative perspective on motivation , 2022 .

[78]  Angelo Cangelosi,et al.  The Mechanics of Embodiment: A Dialog on Embodiment and Computational Modeling , 2011, Front. Psychology.

[79]  P. Groves,et al.  Burst firing induced in midbrain dopamine neurons by stimulation of the medial prefrontal and anterior cingulate cortices , 1988, Brain Research.

[80]  Massimo Silvetti,et al.  Deficient reinforcement learning in medial frontal cortex as a model of dopamine-related motivational deficits in ADHD , 2013, Neural Networks.

[81]  Timothy E. J. Behrens,et al.  Choice, uncertainty and value in prefrontal and cingulate cortex , 2008, Nature Neuroscience.

[82]  Joshua W. Brown,et al.  Medial prefrontal cortex as an action-outcome predictor , 2011, Nature Neuroscience.

[83]  Gilles Faÿ,et al.  Características inmunológicas claves en la fisiopatología de la sepsis. Infectio , 2009 .

[84]  Clay B. Holroyd,et al.  The neural basis of human error processing: reinforcement learning, dopamine, and the error-related negativity. , 2002, Psychological review.

[85]  Emmanuel Procyk,et al.  Specific frontal neural dynamics contribute to decisions to check , 2016, Nature Communications.

[86]  Ryan K. Jessup,et al.  Error Effects in Anterior Cingulate Cortex Reverse when Error Likelihood Is High , 2010, The Journal of Neuroscience.

[87]  B. Postle,et al.  The cognitive neuroscience of working memory. , 2007, Annual review of psychology.

[88]  Stefan Everling,et al.  Burst Firing Synchronizes Prefrontal and Anterior Cingulate Cortex during Attentional Control , 2014, Current Biology.

[89]  John R. Anderson,et al.  Navigating complex decision spaces: Problems and paradigms in sequential choice. , 2014, Psychological bulletin.

[90]  S. Grossberg,et al.  How does a brain build a cognitive code? , 1980, Psychological review.

[91]  Emiliano Macaluso,et al.  Auditory temporal expectations modulate activity in visual cortex , 2010, NeuroImage.

[92]  M. Husain,et al.  Neurocomputational mechanisms underlying subjective valuation of effort costs , 2017, PLoS biology.

[93]  Jonathan D. Cohen,et al.  The Expected Value of Control: An Integrative Theory of Anterior Cingulate Cortex Function , 2013, Neuron.

[94]  S. Bouret,et al.  Noradrenaline and Dopamine Neurons in the Reward/Effort Trade-Off: A Direct Electrophysiological Comparison in Behaving Monkeys , 2015, The Journal of Neuroscience.

[95]  J. L. Roux An Introduction to the Kalman Filter , 2003 .