Model-based hierarchical reinforcement learning and human action control

Recent work has reawakened interest in goal-directed or ‘model-based’ choice, where decisions are based on prospective evaluation of potential action outcomes. Concurrently, there has been growing attention to the role of hierarchy in decision-making and action control. We focus here on the intersection between these two areas of interest, considering the topic of hierarchical model-based control. To characterize this form of action control, we draw on the computational framework of hierarchical reinforcement learning, using this to interpret recent empirical findings. The resulting picture reveals how hierarchical model-based mechanisms might play a special and pivotal role in human decision-making, dramatically extending the scope and complexity of human behaviour.

[1]  W. James,et al.  The Principles of Psychology. , 1983 .

[2]  F. W. Irwin Purposive Behavior in Animals and Men , 1932, The Psychological Clinic.

[3]  G. Miller,et al.  Plans and the structure of behavior , 1960 .

[4]  A. Battersby Plans and the Structure of Behavior , 1968 .

[5]  Earl D. Sacerdoti,et al.  Planning in a Hierarchy of Abstraction Spaces , 1974, IJCAI.

[6]  J. Shaoul Human Error , 1973, Nature.

[7]  L. Nadel,et al.  The Hippocampus as a Cognitive Map , 1978 .

[8]  Barbara Hayes-Roth,et al.  A Cognitive Model of Planning , 1979, Cogn. Sci..

[9]  M. Eckardt The Hippocampus as a Cognitive Map , 1980 .

[10]  Allen Newell,et al.  SOAR: An Architecture for General Intelligence , 1987, Artif. Intell..

[11]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[12]  D. Kahneman,et al.  Duration neglect in retrospective evaluations of affective episodes. , 1993, Journal of personality and social psychology.

[13]  Peter Dayan,et al.  Improving Generalization for Temporal Difference Learning: The Successor Representation , 1993, Neural Computation.

[14]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[15]  Thomas G. Dietterich State Abstraction in MAXQ Hierarchical Reinforcement Learning , 1999, NIPS.

[16]  Andrew W. Moore,et al.  Multi-Value-Functions: Efficient Automatic Action Hierarchies for Multiple Goal MDPs , 1999, IJCAI.

[17]  T. Shallice,et al.  CONTENTION SCHEDULING AND THE CONTROL OF ROUTINE ACTIVITIES , 2000, Cognitive neuropsychology.

[18]  D. Ariely,et al.  Gestalt characteristics of experiences: the defining features of summarized events , 2000 .

[19]  Bernhard Hengst,et al.  Discovering Hierarchy in Reinforcement Learning with HEXQ , 2002, ICML.

[20]  P. Dayan,et al.  Reward, Motivation, and Reinforcement Learning , 2002, Neuron.

[21]  G. Baldassarre A biologically plausible model of human planning based on neural networks and Dyna-PI models , 2002 .

[22]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[23]  E. Koechlin,et al.  The Architecture of Cognitive Control in the Human Prefrontal Cortex , 2003, Science.

[24]  Dana S. Nau,et al.  SHOP2: An HTN Planning System , 2003, J. Artif. Intell. Res..

[25]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[26]  Hanspeter A. Mallot,et al.  'Fine-to-Coarse' Route Planning and Navigation in Regionalized Environments , 2003, Spatial Cogn. Comput..

[27]  R. Morris,et al.  The Cognitive Psychology of Planning , 2004 .

[28]  John R Anderson,et al.  An integrated theory of the mind. , 2004, Psychological review.

[29]  D. Plaut,et al.  Doing without schema hierarchies: a recurrent connectionist approach to normal and impaired routine sequential action. , 2004, Psychological review.

[30]  Yael Niv,et al.  Uncertainty-based competition between prefrontal and striatal systems for behavioural control , 2005 .

[31]  C. Mckenzie,et al.  Underestimating the duration of future events: memory incorrectly used or memory bias? , 2005, Psychological bulletin.

[32]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[33]  D. Plaut,et al.  Such stuff as habits are made on: A reply to Cooper and Shallice (2006). , 2006 .

[34]  J. Tenenbaum,et al.  Optimal Predictions in Everyday Cognition , 2006, Psychological science.

[35]  Toshio Goto,et al.  New Step of Science Council of Japan , 2006 .

[36]  Andrew G. Barto,et al.  Causal Graph Based Decomposition of Factored MDPs , 2006, J. Mach. Learn. Res..

[37]  R. Buckner,et al.  Self-projection and the brain , 2007, Trends in Cognitive Sciences.

[38]  David Badre,et al.  Cognitive control, hierarchy, and the rostro–caudal organization of the frontal lobes , 2008, Trends in Cognitive Sciences.

[39]  M. Botvinick,et al.  Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective , 2009, Cognition.

[40]  B. Balleine,et al.  Evidence of Action Sequence Chunking in Goal-Directed Instrumental Conditioning and Its Dependence on the Dorsomedial Prefrontal Cortex , 2009, The Journal of Neuroscience.

[41]  Lee Spector,et al.  Genetic Programming for Reward Function Search , 2010, IEEE Transactions on Autonomous Mental Development.

[42]  P. Dayan,et al.  States versus Rewards: Dissociable Neural Prediction Error Signals Underlying Model-Based and Model-Free Reinforcement Learning , 2010, Neuron.

[43]  Ethan S. Bromberg-Martin,et al.  Dopamine in Motivational Control: Rewarding, Aversive, and Alerting , 2010, Neuron.

[44]  Andrew G. Barto,et al.  Intrinsically Motivated Hierarchical Skill Learning in Structured Environments , 2010, IEEE Transactions on Autonomous Mental Development.

[45]  Joseph T. McGuire,et al.  A Neural Signature of Hierarchical Reinforcement Learning , 2011, Neuron.

[46]  J. Tenenbaum,et al.  Predicting the future as Bayesian inference: people combine prior knowledge with observations when estimating duration and extent. , 2011, Journal of experimental psychology. General.

[47]  Leslie Pack Kaelbling,et al.  Hierarchical task and motion planning in the now , 2011, 2011 IEEE International Conference on Robotics and Automation.

[48]  N. Daw,et al.  The ubiquity of model-based reinforcement learning , 2012, Current Opinion in Neurobiology.

[49]  A. Kruglanski,et al.  The energetics of motivated cognition: a force-field analysis. , 2012, Psychological review.

[50]  Clay B. Holroyd,et al.  Motivation of extended behaviors by anterior cingulate cortex , 2012, Trends in Cognitive Sciences.

[51]  M. Frank,et al.  Mechanisms of hierarchical reinforcement learning in corticostriatal circuits 1: computational analysis. , 2012, Cerebral cortex.

[52]  David Silver,et al.  Compositional Planning Using Optimal Option Models , 2012, ICML.

[53]  Jonathan D. Cohen,et al.  The Function and Organization of Lateral Prefrontal Cortex: A Test of Competing Hypotheses , 2012, PloS one.

[54]  Alec Solway,et al.  Goal-directed decision making as probabilistic inference: a computational framework and potential neural correlates. , 2012, Psychological review.

[55]  Benjamin Kuipers,et al.  Autonomous Learning of High-Level States and Actions in Continuous Environments , 2012, IEEE Transactions on Autonomous Mental Development.

[56]  M. Botvinick Hierarchical reinforcement learning and decision making , 2012, Current Opinion in Neurobiology.

[57]  B. Balleine,et al.  Habits, action sequences and reinforcement learning , 2012, The European journal of neuroscience.

[58]  Quentin J. M. Huys,et al.  Hierarchical deconstruction and memoization of goal-directed plans , 2013 .

[59]  Bernard W. Balleine,et al.  Actions, Action Sequences and Habits: Evidence That Goal-Directed and Habitual Action Control Are Hierarchically Organized , 2013, PLoS Comput. Biol..

[60]  M. Botvinick,et al.  Neural representations of events arise from temporal community structure , 2013, Nature Neuroscience.

[61]  Jonathan D. Cohen,et al.  The Expected Value of Control: An Integrative Theory of Anterior Cingulate Cortex Function , 2013, Neuron.

[62]  Carlos Diuk,et al.  Hierarchical Learning Induces Two Simultaneous, But Separable, Prediction Errors in Human Basal Ganglia , 2013, The Journal of Neuroscience.

[63]  A. Markman,et al.  The Curse of Planning: Dissecting Multiple Reinforcement-Learning Systems by Taxing the Central Executive , 2013 .

[64]  Xiaowei Geng,et al.  Hedonic Evaluation over Short and Long Retention Intervals: The Mechanism of the Peak–End Rule , 2013 .

[65]  Andrew G. Barto,et al.  Behavioral Hierarchy: Exploration and Representation , 2013, Computational and Robotic Models of the Hierarchical Organization of Behavior.

[66]  Leslie Pack Kaelbling,et al.  Symbol Acquisition for Task-Level Planning , 2013, AAAI Workshop: Learning Rich Representations from Low-Level Sensors.

[67]  P. Dayan,et al.  Goals and Habits in the Brain , 2013, Neuron.

[68]  Joshua W. Brown,et al.  Prefrontal cortex organization: dissociating effects of temporal abstraction, relational abstraction, and integration with FMRI. , 2014, Cerebral cortex.

[69]  Alec Solway,et al.  Optimal Behavioral Hierarchy , 2014, PLoS Comput. Biol..

[70]  P. Dayan,et al.  The algorithmic anatomy of model-based evaluation , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.

[71]  Amir Dezfouli,et al.  Habits as action sequences: hierarchical action control and changes in outcome value , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.

[72]  Juho Rousu,et al.  Comparative Genome-Scale Reconstruction of Gapless Metabolic Networks for Present and Ancestral Species , 2014, PLoS Comput. Biol..

[73]  Ben M. Crittenden,et al.  Task Difficulty Manipulation Reveals Multiple Demand Activity but no Frontal Lobe Hierarchy , 2012, Cerebral cortex.

[74]  Dale E. Zand Force Field Analysis , 2015 .