Hierarchical Reinforcement Learning as a Model of Human Task Interleaving

How do people decide how long to continue in a task, when to switch, and to which other task? Understanding the mechanisms that underpin task interleaving is a long-standing goal in the cognitive sciences. Prior work suggests greedy heuristics and a policy maximizing the marginal rate of return. However, it is unclear how such a strategy would allow for adaptation to everyday environments that offer multiple tasks with complex switch costs and delayed rewards. Here we develop a hierarchical model of supervisory control driven by reinforcement learning (RL). The supervisory level learns to switch using task-specific approximate utility estimates, which are computed on the lower level. A hierarchically optimal value function decomposition can be learned from experience, even in conditions with multiple tasks and arbitrary and uncertain reward and cost structures. The model reproduces known empirical effects of task interleaving. It yields better predictions of individual-level data than a myopic baseline in a six-task problem (N=211). The results support hierarchical RL as a plausible model of task interleaving.

[1]  Christopher D. Wickens,et al.  Examining the Impact of Cell Phone Conversations on Driving Using Meta-Analytic Techniques , 2006, Hum. Factors.

[2]  Donald A. Norman,et al.  Attention to Action , 1986 .

[3]  A. Jersild Mental set and shift , 2011 .

[4]  Christopher D. Wickens,et al.  The role of reward and effort over time in task switching , 2019, Theoretical Issues in Ergonomics Science.

[5]  Christopher D. Wickens,et al.  Executive Control , 2021, Encyclopedia of Evolutionary Psychological Science.

[6]  Thomas G. Dietterich The MAXQ Method for Hierarchical Reinforcement Learning , 1998, ICML.

[7]  Hansjörg Neth,et al.  Discretionary task interleaving: heuristics for time allocation in cognitive foraging. , 2007, Journal of experimental psychology. General.

[8]  Chris Eliasmith,et al.  A neural model of hierarchical reinforcement learning , 2017, CogSci.

[9]  E. M. Altmann,et al.  Timecourse of recovery from task interruption: Data and a model , 2006, Psychonomic bulletin & review.

[10]  D. Hassabis,et al.  Neural Mechanisms of Hierarchical Planning in a Virtual Subway Network , 2016, Neuron.

[11]  M. Botvinick Hierarchical reinforcement learning and decision making , 2012, Current Opinion in Neurobiology.

[12]  D. Meyer,et al.  Executive control of cognitive processes in task switching. , 2001, Journal of experimental psychology. Human perception and performance.

[13]  Christopher D. Wickens,et al.  Discrete task switching in overload: A meta-analyses and a model , 2015, Int. J. Hum. Comput. Stud..

[14]  Niels Taatgen,et al.  Toward a unified theory of the multitasking continuum: from concurrent performance to task switching, interruption, and resumption , 2009, CHI.

[15]  Dario D. Salvucci,et al.  Threaded cognition: an integrated theory of concurrent multitasking. , 2008, Psychological review.

[16]  Christopher D. Wickens,et al.  Factors Affecting Task Management in Aviation , 2007, Hum. Factors.

[17]  Duncan P. Brumby,et al.  Computational Models of User Multitasking , 2018 .

[18]  Duncan P. Brumby,et al.  Strategic Adaptation to Task Characteristics, Incentives, and Individual Differences in Dual-Tasking , 2015, PloS one.

[19]  Kara A. Latorella,et al.  The Scope and Importance of Human Interruption in Human-Computer Interaction Design , 2002, Hum. Comput. Interact..

[20]  M. Botvinick,et al.  Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective , 2009, Cognition.

[21]  Jukka Corander,et al.  Inferring Cognitive Models from Data using Approximate Bayesian Computation , 2016, CHI.

[22]  Duncan P. Brumby,et al.  Strategic Adaptation to Performance Objectives in a Dual-Task Setting , 2010, Cogn. Sci..

[23]  Duncan P. Brumby,et al.  Natural Break Points , 2012 .

[24]  Geoffrey B. Duggan,et al.  Interleaving tasks to improve performance: Users maximise the marginal rate of return , 2013, Int. J. Hum. Comput. Stud..

[25]  Antti Oulasvirta,et al.  Surviving task interruptions: Investigating the implications of long-term working memory theory , 2006, Int. J. Hum. Comput. Stud..

[26]  Brian P. Bailey,et al.  On the need for attention-aware systems: Measuring effects of interruption on task performance, error rate, and affective state , 2006, Comput. Hum. Behav..

[27]  Stephan Lewandowsky,et al.  Modeling working memory: a computational implementation of the Time-Based Resource-Sharing theory , 2011, Psychonomic bulletin & review.

[28]  R. Passingham Attention to action. , 1996, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[29]  Christopher A. Monk,et al.  Recovering From Interruptions: Implications for Driver Distraction Research , 2004, Hum. Factors.

[30]  G. D. Logan Task Switching , 2022 .

[31]  Nikolaus Kriegeskorte,et al.  Cognitive computational neuroscience , 2018, Nature Neuroscience.

[32]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[33]  M. Frank,et al.  Mechanisms of hierarchical reinforcement learning in corticostriatal circuits 1: computational analysis. , 2012, Cerebral cortex.

[34]  J. Gregory Trafton,et al.  Memory for goals: an activation-based model , 2002, Cogn. Sci..