Hierarchical Reinforcement Learning

Hierarchical Reinforcement Learning (HRL) enables autonomous decomposition of challenging long-horizon decision-making tasks into simpler subtasks. During the past years, the landscape of HRL research has grown profoundly, resulting in copious approaches. A comprehensive overview of this vast landscape is necessary to study HRL in an organized manner. We provide a survey of the diverse HRL approaches concerning the challenges of learning hierarchical policies, subtask discovery, transfer learning, and multi-agent learning using HRL. The survey is presented according to a novel taxonomy of the approaches. Based on the survey, a set of important open problems is proposed to motivate the future research in HRL. Furthermore, we outline a few suitable task domains for evaluating the HRL approaches and a few interesting examples of the practical applications of HRL in the Supplementary Material.

[1]  André da Motta Salles Barreto,et al.  Graph-Based Skill Acquisition For Reinforcement Learning , 2019, ACM Comput. Surv..

[2]  Willi-Hans Steeb,et al.  Finite State Machines , 2001 .

[3]  Doina Precup,et al.  The Option-Critic Architecture , 2016, AAAI.

[4]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[5]  Ah-Hwee Tan,et al.  Multi-agent Reinforcement Learning in Spatial Domain Tasks using Inter Subtask Empowerment Rewards , 2019, 2019 IEEE Symposium Series on Computational Intelligence (SSCI).

[6]  Andrew G. Barto,et al.  Skill Characterization Based on Betweenness , 2008, NIPS.

[7]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[8]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[9]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[10]  Kate Saenko,et al.  Learning Multi-Level Hierarchies with Hindsight , 2017, ICLR.

[11]  Shimon Whiteson,et al.  DAC: The Double Actor-Critic Architecture for Learning Options , 2019, NeurIPS.

[12]  Lars Niklasson,et al.  Time series segmentation using an adaptive resource allocating vector quantization network based on change detection , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[13]  Doina Precup,et al.  When Waiting is not an Option : Learning Options with a Deliberation Cost , 2017, AAAI.

[14]  Pieter Abbeel,et al.  Meta Learning Shared Hierarchies , 2017, ICLR.

[15]  Alicia P. Wolfe,et al.  Identifying useful subgoals in reinforcement learning by local graph partitioning , 2005, ICML.

[16]  Sergey Levine,et al.  Why Does Hierarchy (Sometimes) Work So Well in Reinforcement Learning? , 2019, ArXiv.

[17]  Raia Hadsell,et al.  CoMic: Complementary Task Learning & Mimicry for Reusable Skills , 2020, ICML.

[18]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[19]  Peter Dayan,et al.  Feudal Multi-Agent Hierarchies for Cooperative Reinforcement Learning , 2019, ICLR 2019.

[20]  Andrew G. Barto,et al.  Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining , 2009, NIPS.

[21]  Tom Schaul,et al.  Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[22]  Alessandro Lazaric,et al.  Transfer in Reinforcement Learning: A Framework and a Survey , 2012, Reinforcement Learning.

[23]  Daan Wierstra,et al.  Variational Intrinsic Control , 2016, ICLR.

[24]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[25]  Chelsea Finn,et al.  Language as an Abstraction for Hierarchical Deep Reinforcement Learning , 2019, NeurIPS.

[26]  Sridhar Mahadevan,et al.  Hierarchical multi-agent reinforcement learning , 2001, AGENTS '01.

[27]  Sergey Levine,et al.  Learning to Walk via Deep Reinforcement Learning , 2018, Robotics: Science and Systems.

[28]  Joshua B. Tenenbaum,et al.  Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[29]  Tom Schaul,et al.  FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.

[30]  Pieter Abbeel,et al.  Stochastic Neural Networks for Hierarchical Reinforcement Learning , 2016, ICLR.

[31]  Sergey Levine,et al.  Data-Efficient Hierarchical Reinforcement Learning , 2018, NeurIPS.

[32]  Doina Precup,et al.  Learnings Options End-to-End for Continuous Action Tasks , 2017, ArXiv.

[33]  Honglak Lee,et al.  Hierarchical Reinforcement Learning for Zero-shot Generalization with Subtask Dependencies , 2018, NeurIPS.

[34]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[35]  Marlos C. Machado,et al.  A Laplacian Framework for Option Discovery in Reinforcement Learning , 2017, ICML.

[36]  Rob Fergus,et al.  Learning Goal Embeddings via Self-Play for Hierarchical Reinforcement Learning , 2018, ArXiv.

[37]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[38]  Doina Precup,et al.  The Option Keyboard: Combining Skills in Reinforcement Learning , 2021, NeurIPS.

[39]  Doina Precup,et al.  Learning Options with Interest Functions , 2019, AAAI.

[40]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[41]  Nahum Shimkin,et al.  Unified Inter and Intra Options Learning Using Policy Gradient Methods , 2011, EWRL.

[42]  R Bellman,et al.  On the Theory of Dynamic Programming. , 1952, Proceedings of the National Academy of Sciences of the United States of America.

[43]  Sergey Levine,et al.  Near-Optimal Representation Learning for Hierarchical Reinforcement Learning , 2018, ICLR.

[44]  Shimon Whiteson,et al.  QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[45]  Ion Stoica,et al.  Multi-Level Discovery of Deep Options , 2017, ArXiv.

[46]  Jonathan P. How,et al.  Learning for Decentralized Control of Multiagent Systems in Large, Partially-Observable Stochastic Environments , 2016, AAAI.

[47]  Peter Stone,et al.  The utility of temporal abstraction in reinforcement learning , 2008, AAMAS.

[48]  Jan Peters,et al.  Probabilistic inference for determining options in reinforcement learning , 2016, Machine Learning.

[49]  Karol Hausman,et al.  Learning an Embedding Space for Transferable Robot Skills , 2018, ICLR.

[50]  Ilya Kostrikov,et al.  Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play , 2017, ICLR.

[51]  Haim Kaplan,et al.  Planning in Hierarchical Reinforcement Learning: Guarantees for Using Local Policies , 2019, ALT.

[52]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[53]  Mostafa Al-Emran,et al.  Hierarchical Reinforcement Learning: A Survey , 2015 .

[54]  Hussein A. Abbass,et al.  Hierarchical Deep Reinforcement Learning for Continuous Action Control , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[55]  Pieter Abbeel,et al.  Variational Option Discovery Algorithms , 2018, ArXiv.

[56]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[57]  Andrew G. Barto,et al.  Building Portable Options: Skill Transfer in Reinforcement Learning , 2007, IJCAI.

[58]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[59]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[60]  Shie Mannor,et al.  Q-Cut - Dynamic Discovery of Sub-goals in Reinforcement Learning , 2002, ECML.

[61]  Samuel Gershman,et al.  Deep Successor Reinforcement Learning , 2016, ArXiv.

[62]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[63]  Sergey Levine,et al.  Search on the Replay Buffer: Bridging Planning and Reinforcement Learning , 2019, NeurIPS.

[64]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[65]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[66]  Jürgen Schmidhuber,et al.  Hierarchical reinforcement learning with subpolicies specializing for learned subgoals , 2004, Neural Networks and Computational Intelligence.

[67]  Magnus Borga,et al.  Hierarchical Reinforcement Learning , 1993 .

[68]  Li Fei-Fei,et al.  CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[69]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[70]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[71]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[72]  Sridhar Mahadevan,et al.  Learning to Take Concurrent Actions , 2002, NIPS.

[73]  Sergey Levine,et al.  Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings , 2018, ICML.

[74]  Yoshua Bengio,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.

[75]  Hongyuan Zha,et al.  Hierarchical Cooperative Multi-Agent Reinforcement Learning with Skill Discovery , 2020, AAMAS.

[76]  Sergey Levine,et al.  Dynamics-Aware Unsupervised Discovery of Skills , 2019, ICLR.

[77]  Doina Precup,et al.  Options of Interest: Temporal Abstraction with Interest Functions , 2020, AAAI.

[78]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[79]  Doina Precup,et al.  Learning Options in Reinforcement Learning , 2002, SARA.

[80]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[81]  Tomás Lozano-Pérez,et al.  A Framework for Multiple-Instance Learning , 1997, NIPS.

[82]  Andrew G. Barto,et al.  Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.

[83]  Alborz Geramifard,et al.  Decentralized control of Partially Observable Markov Decision Processes using belief space macro-actions , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[84]  Razvan Pascanu,et al.  Policy Distillation , 2015, ICLR.

[85]  Li Wang,et al.  Hierarchical Deep Multiagent Reinforcement Learning , 2018, ArXiv.

[86]  Sergey Levine,et al.  Deep Reinforcement Learning for Robotic Manipulation , 2016, ArXiv.

[87]  D. R. Fulkerson,et al.  On the Max Flow Min Cut Theorem of Networks. , 1955 .

[88]  Rob Fergus,et al.  Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[89]  Richard Bellman,et al.  Dynamic Programming Treatment of the Travelling Salesman Problem , 1962, JACM.

[90]  Sergey Levine,et al.  Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.

[91]  Gerald Tesauro,et al.  Learning Abstract Options , 2018, NeurIPS.

[92]  Sergey Levine,et al.  Relay Policy Learning: Solving Long-Horizon Tasks via Imitation and Reinforcement Learning , 2019, CoRL.

[93]  Shie Mannor,et al.  A Deep Hierarchical Approach to Lifelong Learning in Minecraft , 2016, AAAI.

[94]  Doina Precup,et al.  Option-critic in cooperative multi-agent systems , 2019, AAMAS.

[95]  George Konidaris,et al.  Option Discovery using Deep Skill Chaining , 2020, ICLR.

[96]  Sridhar Mahadevan,et al.  Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes , 2007, J. Mach. Learn. Res..