Rational metareasoning and the plasticity of cognitive control

The human brain has the impressive capacity to adapt how it processes information to high-level goals. While it is known that these cognitive control skills are malleable and can be improved through training, the underlying plasticity mechanisms are not well understood. Here, we develop and evaluate a model of how people learn when to exert cognitive control, which controlled process to use, and how much effort to exert. We derive this model from a general theory according to which the function of cognitive control is to select and configure neural pathways so as to make optimal use of finite time and limited computational resources. The central idea of our Learned Value of Control model is that people use reinforcement learning to predict the value of candidate control signals of different types and intensities based on stimulus features. This model correctly predicts the learning and transfer effects underlying the adaptive control-demanding behavior observed in an experiment on visual attention and four experiments on interference control in Stroop and Flanker paradigms. Moreover, our model explained these findings significantly better than an associative learning model and a Win-Stay Lose-Shift model. Our findings elucidate how learning and experience might shape people’s ability and propensity to adaptively control their minds and behavior. We conclude by predicting under which circumstances these learning mechanisms might lead to self-control failure.

[1]  F RESTLE,et al.  The selection of strategies in cue learning. , 1962, Psychological review.

[2]  L. Kamin Predictability, surprise, attention, and conditioning , 1967 .

[3]  Donald Laming,et al.  Information theory of choice-reaction times , 1968 .

[4]  D. Lindley,et al.  Bayes Estimates for the Linear Model , 1972 .

[5]  R. Rescorla,et al.  A theory of Pavlovian conditioning : Variations in the effectiveness of reinforcement and nonreinforcement , 1972 .

[6]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[7]  T. Carr,et al.  Automaticity in skill acquisition: Mechanisms for reducing interference in concurrent performance. , 1989 .

[8]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[9]  James L. McClelland,et al.  On the control of automatic processes: a parallel distributed processing account of the Stroop effect. , 1990, Psychological review.

[10]  Stuart J. Russell,et al.  Principles of Metareasoning , 1989, Artif. Intell..

[11]  E. Donchin,et al.  Optimizing the use of information: strategic control of activation of responses. , 1992, Journal of experimental psychology. General.

[12]  J. Stroop Studies of interference in serial verbal reactions. , 1992 .

[13]  D. Alan Allport,et al.  SHIFTING INTENTIONAL SET - EXPLORING THE DYNAMIC CONTROL OF TASKS , 1994 .

[14]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[15]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[16]  R. Klein,et al.  Inhibition of return , 2000, Trends in Cognitive Sciences.

[17]  M. Botvinick,et al.  Conflict monitoring and cognitive control. , 2001, Psychological review.

[18]  E. Miller,et al.  An integrative theory of prefrontal cortex function. , 2001, Annual review of neuroscience.

[19]  Karl J. Friston Functional integration and inference in the brain , 2002, Progress in Neurobiology.

[20]  B. Hommel,et al.  Task-switching and long-term priming: Role of episodic stimulus–task bindings in task-shift costs , 2003, Cognitive Psychology.

[21]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[22]  B. Hommel,et al.  Semantic generalization of stimulus-task bindings , 2004, Psychonomic bulletin & review.

[23]  R. Baumeister,et al.  High self-control predicts good adjustment, less pathology, better grades, and interpersonal success. , 2004, Journal of personality.

[24]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[25]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[26]  K. Miller Executive functions. , 2005, Pediatric annals.

[27]  Michael J. Frank,et al.  Making Working Memory Work: A Computational Model of Learning in the Prefrontal Cortex and Basal Ganglia , 2006, Neural Computation.

[28]  P. Dayan,et al.  Tonic dopamine: opportunity costs and the control of response vigor , 2007, Psychopharmacology.

[29]  U. Mayr,et al.  Outsourcing control to the environment: effects of stimulus/response locations on task selection , 2007, Psychological research.

[30]  B. Hommel,et al.  The costs and benefits of cross-task priming , 2007, Memory & cognition.

[31]  Jonathan D. Cohen,et al.  Learning to Use Working Memory in Partially Observable Environments through Dopaminergic Reinforcement , 2008, NIPS.

[32]  L. Jacoby,et al.  Multiple levels of control in the Stroop task , 2008, Memory & cognition.

[33]  J. Kray,et al.  How useful is executive control training? Age differences in near and far transfer of task-switching training. , 2009, Developmental science.

[34]  Puiu F. Balan,et al.  Attention as a decision in information space , 2010, Trends in Cognitive Sciences.

[35]  C. N. Boehler,et al.  The influence of reward associations on conflict processing in the Stroop task , 2010, Cognition.

[36]  Jessica A. Grahn,et al.  Putting brain training to the test , 2010, Nature.

[37]  M. Ullsperger,et al.  Post-Error Adjustments , 2011, Front. Psychology.

[38]  Andrew M. Saxe,et al.  Acquisition of decision making criteria: reward rate ultimately beats accuracy , 2011, Attention, perception & psychophysics.

[39]  J. Heckman,et al.  A gradient of childhood self-control predicts health, wealth, and public safety , 2011, Proceedings of the National Academy of Sciences.

[40]  L. Jacoby,et al.  Why it is too early to lose control in accounts of item-specific proportion congruency effects. , 2011, Journal of experimental psychology. Human perception and performance.

[41]  Amir Dezfouli,et al.  Speed/Accuracy Trade-Off between the Habitual and the Goal-Directed Processes , 2011, PLoS Comput. Biol..

[42]  Luiz Pessoa,et al.  Reward Reduces Conflict by Enhancing Attentional Control and Biasing Visual Cortical Processing , 2011, Journal of Cognitive Neuroscience.

[43]  Clay B. Holroyd,et al.  Motivation of extended behaviors by anterior cingulate cortex , 2012, Trends in Cognitive Sciences.

[44]  T. Braver The variable nature of cognitive control: a dual mechanisms framework , 2012, Trends in Cognitive Sciences.

[45]  P. Dayan How to set the switches on this thing , 2012, Current Opinion in Neurobiology.

[46]  Thomas S. Redick,et al.  Is working memory training effective? , 2012, Psychological bulletin.

[47]  David Tolpin,et al.  Selecting Computations: Theory and Applications , 2012, UAI.

[48]  Robert C. Wilson,et al.  Rational regulation of learning dynamics by pupil–linked arousal systems , 2012, Nature Neuroscience.

[49]  W. Notebaert,et al.  Reward modulates adaptations to conflict , 2012, Cognition.

[50]  M. Botvinick,et al.  The intrinsic cost of cognitive control. , 2013, The Behavioral and brain sciences.

[51]  Monica Melby-Lervåg,et al.  Is working memory training effective? A meta-analytic review. , 2013, Developmental psychology.

[52]  Jonathan D. Cohen,et al.  The Expected Value of Control: An Integrative Theory of Anterior Cingulate Cortex Function , 2013, Neuron.

[53]  Camarin E. Rolle,et al.  Video game training enhances cognitive control in older adults , 2013, Nature.

[54]  Thomas L. Griffiths,et al.  Algorithm selection by rational metareasoning as a model of human strategy selection , 2014, NIPS.

[55]  Ari Weinstein,et al.  Model-based hierarchical reinforcement learning and human action control , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.

[56]  Jordan W. Suchow Measuring, monitoring, and maintaining memories in a partially observable mind , 2014 .

[57]  Joseph T. McGuire,et al.  Functionally Dissociable Influences on Learning Rate in a Dynamic Environment , 2014, Neuron.

[58]  T. Egner Creatures of habit (and control): a multi-level learning perspective on the modulation of congruency effects , 2014, Front. Psychol..

[59]  M. Botvinick,et al.  A labor/leisure tradeoff in cognitive control. , 2014, Journal of experimental psychology. General.

[60]  M. Inzlicht,et al.  Why self-control seems (but may not be) limited , 2014, Trends in Cognitive Sciences.

[61]  J D Cohen,et al.  Multitasking versus multiplexing: Toward a normative account of limitations in the simultaneous execution of control-demanding behaviors , 2014, Cognitive, affective & behavioral neuroscience.

[62]  Massimo Silvetti,et al.  Adaptive effort investment in cognitive and physical tasks: a neurocomputational model , 2015, Front. Behav. Neurosci..

[63]  M. Botvinick,et al.  A Computational Model of Control Allocation based on the Expected Value of Control , 2015 .

[64]  Thomas L. Griffiths,et al.  When to use which heuristic: A rational solution to the strategy selection problem , 2015, CogSci.

[65]  N. Daw,et al.  Deciding How To Decide: Self-Control and Meta-Decision Making , 2015, Trends in Cognitive Sciences.

[66]  Samuel J. Gershman,et al.  Computational rationality: A converging paradigm for intelligence in brains, minds, and machines , 2015, Science.

[67]  M. Husain,et al.  Reward Pays the Cost of Noise Reduction in Motor and Cognitive Control , 2015, Current Biology.

[68]  Samuel M. McClure,et al.  Hierarchical control over effortful behavior by rodent medial frontal cortex: A computational model. , 2015, Psychological review.

[69]  Leslie Pack Kaelbling,et al.  Bayesian Optimization with Exponential Convergence , 2015, NIPS.

[70]  Jonathan D. Cohen,et al.  Controlled vs. Automatic Processing: A Graph-Theoretic Approach to the Analysis of Serial vs. Parallel Processing in Neural Network Architectures , 2016, CogSci.

[71]  Sheng He,et al.  Decomposing experience-driven attention: Opposite attentional effects of previously predictive cues , 2016, Attention, Perception, & Psychophysics.

[72]  W. Notebaert,et al.  Grounding cognitive control in associative learning. , 2016, Psychological bulletin.

[73]  Wouter Kool,et al.  Cost-Benefit Arbitration Between Multiple Reinforcement-Learning Systems , 2017, Psychological science.

[74]  M. Botvinick,et al.  Cognitive Control as Cost‐Benefit Decision Making , 2017 .

[75]  Falk Lieder,et al.  Enhancing metacognitive reinforcement learning using reward structures and feedback , 2021, CogSci.

[76]  T. Griffiths,et al.  Strategy Selection as Rational Metareasoning , 2017, Psychological review.

[77]  Jonathan D. Cohen,et al.  Toward a Rational and Mechanistic Account of Mental Effort. , 2017, Annual review of neuroscience.

[78]  Luigi Acerbi,et al.  Practical Bayesian Optimization for Model Fitting with Bayesian Adaptive Direct Search , 2017, NIPS.

[79]  Anuj K. Shah,et al.  Thinking, Fast and Slow? Some Field Experiments to Reduce Crime and Dropout in Chicago* , 2015, The quarterly journal of economics.

[80]  Noah D. Goodman,et al.  Empirical evidence for resource-rational anchoring and adjustment , 2017, Psychonomic Bulletin & Review.

[81]  Paul S. Muhle-Karbe,et al.  Causal Evidence for Learning-Dependent Frontal Lobe Contributions to Cognitive Control , 2017, The Journal of Neuroscience.

[82]  Noah D. Goodman,et al.  The anchoring bias reflects rational use of cognitive resources , 2018, Psychonomic bulletin & review.