Meta-control of the exploration-exploitation dilemma emerges from probabilistic inference over a hierarchy of time scales

Cognitive control is typically understood as a set of mechanisms which enable humans to reach goals that require integrating the consequences of actions over longer time scales. Importantly, using routine beheavior or making choices beneficial only at a short time scales would prevent one from attaining these goals. During the past two decades, researchers have proposed various computational cognitive models that successfully account for behaviour related to cognitive control in a wide range of laboratory tasks. As humans operate in a dynamic and uncertain environment, making elaborate plans and integrating experience over multiple time scales is computationally expensive, the specific question of how uncertain consequences at different time scales are integrated into adaptive decisions remains poorly understood. Here, we propose that precisely the problem of integrating experience and forming elaborate plans over multiple time scales is a key component for better understanding how human agents solve cognitive control dilemmas such as the exploration-exploitation dilemma. In support of this conjecture, we present a computational model of probabilistic inference over hidden states and actions, which are represented as a hierarchy of time scales. Simulations of goal-reaching agents instantiating the model in an uncertain and dynamic task environment show how the exploration-exploitation dilemma may be solved by inferring meta-control states which adapt behaviour to changing contexts.

[1]  Tong Lu,et al.  On Reinforcement Learning for Full-length Game of StarCraft , 2018, AAAI.

[2]  Shun-Zheng Yu Applications of HSMMs , 2016 .

[3]  Etienne Koechlin,et al.  Hierarchical Control of Behaviour in Human Prefrontal Cortex , 2017 .

[4]  Karl J. Friston,et al.  Generalised free energy and active inference , 2018, Biological Cybernetics.

[5]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[6]  Michael L. Littman,et al.  A tutorial on partially observable Markov decision processes , 2009 .

[7]  Karl J. Friston,et al.  Caching mechanisms for habit formation in Active Inference , 2019, Neurocomputing.

[8]  A. Battersby Plans and the Structure of Behavior , 1968 .

[9]  Marco K. Wittmann,et al.  Multiple Neural Mechanisms of Decision Making and Their Competition under Changing Risk Pressure , 2014, Neuron.

[10]  M. Botvinick,et al.  Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective , 2009, Cognition.

[11]  Samuel M. McClure,et al.  Hierarchical control over effortful behavior by rodent medial frontal cortex: A computational model. , 2015, Psychological review.

[12]  Samuel J. Gershman,et al.  Pure Correlates of Exploration and Exploitation in the Human Brain , 2017 .

[13]  J. Kable Valuation, Intertemporal Choice, and Self-Control , 2014 .

[14]  David Hsu,et al.  Motion planning under uncertainty for robotic tasks with long time horizons , 2010, Int. J. Robotics Res..

[15]  Martin Lauer,et al.  A Literature Review on the Prediction of Pedestrian Behavior in Urban Scenarios , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[16]  Giovanni Pezzulo,et al.  Divide et impera: subgoaling reduces the complexity of probabilistic inference and problem solving , 2015, Journal of The Royal Society Interface.

[17]  Karl J. Friston,et al.  A Hierarchy of Time-Scales and the Brain , 2008, PLoS Comput. Biol..

[18]  T. Heimburg,et al.  Voltage-Gated Lipid Ion Channels , 2012, PloS one.

[19]  E. Miller,et al.  An integrative theory of prefrontal cortex function. , 2001, Annual review of neuroscience.

[20]  B. Hayden,et al.  A distributed, hierarchical and recurrent framework for reward-based choice , 2017, Nature Reviews Neuroscience.

[21]  Christian F. Doeller,et al.  Hippocampal hierarchical networks for space, time, and memory , 2017, Current Opinion in Behavioral Sciences.

[22]  Shie Mannor,et al.  Bayesian Reinforcement Learning: A Survey , 2015, Found. Trends Mach. Learn..

[23]  David Hsu,et al.  Motion planning under uncertainty for robotic tasks with long time horizons , 2010, Int. J. Robotics Res..

[24]  E. Koechlin,et al.  The Architecture of Cognitive Control in the Human Prefrontal Cortex , 2003, Science.

[25]  Doina Precup,et al.  Constructing Temporal Abstractions Autonomously in Reinforcement Learning , 2018, AI Mag..

[26]  Karl J. Friston,et al.  Computational mechanisms of curiosity and goal-directed exploration , 2018, bioRxiv.

[27]  Z. Kurth-Nelson,et al.  Anterior Cingulate Cortex Instigates Adaptive Switches in Choice by Integrating Immediate and Delayed Components of Value in Ventromedial Prefrontal Cortex , 2014, The Journal of Neuroscience.

[28]  Angela J. Yu,et al.  Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration , 2007, Philosophical Transactions of the Royal Society B: Biological Sciences.

[29]  Thomas Goschke,et al.  Volition in Action: Intentions, Control Dilemmas, and the Dynamic Regulation of Cognitive Control , 2013 .

[30]  J. Bargh,et al.  The psychology of action : linking cognition and motivation to behavior , 1999 .

[31]  Timothy E. J. Behrens,et al.  Neural Mechanisms of Foraging , 2012, Science.

[32]  T. Griffiths,et al.  Strategy Selection as Rational Metareasoning , 2017, Psychological review.

[33]  Javier Alonso-Mora,et al.  Planning and Decision-Making for Autonomous Vehicles , 2018, Annu. Rev. Control. Robotics Auton. Syst..

[34]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[35]  Thomas Goschke,et al.  Voluntary action and cognitive control from a cognitive neuroscience perspective , 2003 .

[36]  Karl J. Friston,et al.  The Dopaminergic Midbrain Encodes the Expected Certainty about Desired Outcomes , 2014, Cerebral cortex.

[37]  Shun-Zheng Yu,et al.  Hidden Semi-Markov Models: Theory, Algorithms and Applications , 2015 .

[38]  Karl J. Friston,et al.  Active Inference, homeostatic regulation and adaptive behavioural control , 2015, Progress in Neurobiology.

[39]  Elisabeth Pacherie,et al.  Intentions: The dynamic hierarchical model revisited. , 2018, Wiley interdisciplinary reviews. Cognitive science.

[40]  Jonathan D. Cohen,et al.  The Expected Value of Control: An Integrative Theory of Anterior Cingulate Cortex Function , 2013, Neuron.

[41]  Karl J. Friston,et al.  Active Inference: A Process Theory , 2017, Neural Computation.

[42]  Jon Sutherland,et al.  Planning and Decision Making , 1997 .

[43]  Stefan Scherbaum,et al.  Harder than Expected: Increased Conflict in Clearly Disadvantageous Delayed Choices in a Computer Game , 2013, PloS one.

[44]  Jonathan D. Cohen,et al.  Cognitive Control: Core Constructs and Current Considerations , 2017 .

[45]  T. Goschke,et al.  The dynamics of cognitive control: evidence for within-trial conflict adaptation from frequency-tagged EEG. , 2011, Psychophysiology.

[46]  David Badre,et al.  Frontal Cortex and the Hierarchical Control of Behavior , 2018, Trends in Cognitive Sciences.

[47]  Thorsten Pachur,et al.  Dynamic cognitive models of intertemporal choice , 2018, Cognitive Psychology.

[48]  J. Gagné Literature Review , 2018, Journal of ultrasound in medicine : official journal of the American Institute of Ultrasound in Medicine.

[49]  Tobias Egner,et al.  Conflict Adaptation: Past, Present, and Future of the Congruency Sequence Effect as an Index of Cognitive Control , 2017 .

[50]  Stefan J. Kiebel,et al.  Context-Dependent Risk Aversion: A Model-Based Approach , 2018, Front. Psychol..

[51]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[52]  Samuel J. Gershman,et al.  Computational rationality: A converging paradigm for intelligence in brains, minds, and machines , 2015, Science.

[53]  Stefan J Kiebel,et al.  Predicting change: Approximate inference under explicit representation of temporal structure in changing environments , 2019, PLoS Comput. Biol..

[54]  Jonathan D. Cohen,et al.  The Computational and Neural Basis of Cognitive Control: Charted Territory and New Frontiers , 2014, Cogn. Sci..

[55]  Alberto Finzi,et al.  Learning attentional regulations for structured tasks execution in robotic cognitive control , 2019, Autonomous Robots.

[56]  T. Goschke,et al.  Emotional modulation of control dilemmas: The role of positive affect, reward, and dopamine in cognitive stability and flexibility , 2014, Neuropsychologia.

[57]  N. Daw,et al.  Deciding How To Decide: Self-Control and Meta-Decision Making , 2015, Trends in Cognitive Sciences.

[58]  Ngo Anh Vien,et al.  A Deep Hierarchical Reinforcement Learning Algorithm in Partially Observable Markov Decision Processes , 2018, IEEE Access.

[59]  Ari Weinstein,et al.  Model-based hierarchical reinforcement learning and human action control , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.

[60]  Karl J. Friston,et al.  Planning and navigation as active inference , 2017, Biological Cybernetics.

[61]  G. Dreisbach,et al.  How positive affect modulates cognitive control: reduced perseveration at the cost of increased distractibility. , 2004, Journal of experimental psychology. Learning, memory, and cognition.

[62]  D. Heeger,et al.  A Hierarchy of Temporal Receptive Windows in Human Cortex , 2008, The Journal of Neuroscience.

[63]  Mathew L. Dixon,et al.  Hierarchical Organization of Frontoparietal Control Networks Underlying Goal-Directed Behavior , 2017 .

[64]  Stefan J. Kiebel,et al.  Active Inference, Belief Propagation, and the Bethe Approximation , 2018, Neural Computation.

[65]  Benjamin Y Hayden,et al.  Dorsal Anterior Cingulate Cortex: A Bottom-Up View. , 2016, Annual review of neuroscience.

[66]  Avishai Henik,et al.  Task Conflict and Proactive Control: A Computational Theory of the Stroop Task , 2017, Psychological review.

[67]  H. Kennedy,et al.  A Large-Scale Circuit Mechanism for Hierarchical Dynamical Processing in the Primate Cortex , 2015, Neuron.

[68]  Julius Kuhl,et al.  From Wishes to Action: The Dead Ends and Short Cuts on the Long Way to Action , 2021, Goal Directed Behavior.

[69]  Andrea Kiesel,et al.  Cognitive Structure, Flexibility, and Plasticity in Human Multitasking—An Integrative Review of Dual-Task and Task-Switching Research , 2018, Psychological bulletin.

[70]  E. Koechlin,et al.  Reasoning, Learning, and Creativity: Frontal Lobe Function and Human Decision-Making , 2012, PLoS biology.

[71]  Stefan Scherbaum,et al.  Dynamic goal states: Adjusting cognitive control without conflict monitoring , 2012, NeuroImage.

[72]  Thomas Goschke,et al.  Conflict-Triggered Goal Shielding , 2008, Psychological science.

[73]  Peter Dayan,et al.  Bonsai Trees in Your Head: How the Pavlovian System Sculpts Goal-Directed Choices by Pruning Decision Trees , 2012, PLoS Comput. Biol..

[74]  A. Heinz,et al.  Pavlovian-to-Instrumental Transfer in Alcohol Dependence: A Pilot Study , 2014, Neuropsychobiology.