论文信息 - Hierarchical Reinforcement Learning

Hierarchical Reinforcement Learning

Hierarchical Reinforcement Learning (HRL) enables autonomous decomposition of challenging long-horizon decision-making tasks into simpler subtasks. During the past years, the landscape of HRL research has grown profoundly, resulting in copious approaches. A comprehensive overview of this vast landscape is necessary to study HRL in an organized manner. We provide a survey of the diverse HRL approaches concerning the challenges of learning hierarchical policies, subtask discovery, transfer learning, and multi-agent learning using HRL. The survey is presented according to a novel taxonomy of the approaches. Based on the survey, a set of important open problems is proposed to motivate the future research in HRL. Furthermore, we outline a few suitable task domains for evaluating the HRL approaches and a few interesting examples of the practical applications of HRL in the Supplementary Material.

[1] André da Motta Salles Barreto,et al. Graph-Based Skill Acquisition For Reinforcement Learning , 2019, ACM Comput. Surv..

[2] Willi-Hans Steeb,et al. Finite State Machines , 2001 .

[3] Doina Precup,et al. The Option-Critic Architecture , 2016, AAAI.

[4] L. Baum,et al. A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[5] Ah-Hwee Tan,et al. Multi-agent Reinforcement Learning in Spatial Domain Tasks using Inter Subtask Empowerment Rewards , 2019, 2019 IEEE Symposium Series on Computational Intelligence (SSCI).

[6] Andrew G. Barto,et al. Skill Characterization Based on Betweenness , 2008, NIPS.

[7] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[8] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[9] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[10] Kate Saenko,et al. Learning Multi-Level Hierarchies with Hindsight , 2017, ICLR.

[11] Shimon Whiteson,et al. DAC: The Double Actor-Critic Architecture for Learning Options , 2019, NeurIPS.

[12] Lars Niklasson,et al. Time series segmentation using an adaptive resource allocating vector quantization network based on change detection , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[13] Doina Precup,et al. When Waiting is not an Option : Learning Options with a Deliberation Cost , 2017, AAAI.

[14] Pieter Abbeel,et al. Meta Learning Shared Hierarchies , 2017, ICLR.

[15] Alicia P. Wolfe,et al. Identifying useful subgoals in reinforcement learning by local graph partitioning , 2005, ICML.

[16] Sergey Levine,et al. Why Does Hierarchy (Sometimes) Work So Well in Reinforcement Learning? , 2019, ArXiv.

[17] Raia Hadsell,et al. CoMic: Complementary Task Learning & Mimicry for Reusable Skills , 2020, ICML.

[18] Yi Wu,et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[19] Peter Dayan,et al. Feudal Multi-Agent Hierarchies for Cooperative Reinforcement Learning , 2019, ICLR 2019.

[20] Andrew G. Barto,et al. Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining , 2009, NIPS.

[21] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[22] Alessandro Lazaric,et al. Transfer in Reinforcement Learning: A Framework and a Survey , 2012, Reinforcement Learning.

[23] Daan Wierstra,et al. Variational Intrinsic Control , 2016, ICLR.

[24] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[25] Chelsea Finn,et al. Language as an Abstraction for Hierarchical Deep Reinforcement Learning , 2019, NeurIPS.

[26] Sridhar Mahadevan,et al. Hierarchical multi-agent reinforcement learning , 2001, AGENTS '01.

[27] Sergey Levine,et al. Learning to Walk via Deep Reinforcement Learning , 2018, Robotics: Science and Systems.

[28] Joshua B. Tenenbaum,et al. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[29] Tom Schaul,et al. FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.

[30] Pieter Abbeel,et al. Stochastic Neural Networks for Hierarchical Reinforcement Learning , 2016, ICLR.

[31] Sergey Levine,et al. Data-Efficient Hierarchical Reinforcement Learning , 2018, NeurIPS.

[32] Doina Precup,et al. Learnings Options End-to-End for Continuous Action Tasks , 2017, ArXiv.

[33] Honglak Lee,et al. Hierarchical Reinforcement Learning for Zero-shot Generalization with Subtask Dependencies , 2018, NeurIPS.

[34] Yoshua Bengio,et al. On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[35] Marlos C. Machado,et al. A Laplacian Framework for Option Discovery in Reinforcement Learning , 2017, ICML.

[36] Rob Fergus,et al. Learning Goal Embeddings via Self-Play for Hierarchical Reinforcement Learning , 2018, ArXiv.

[37] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[38] Doina Precup,et al. The Option Keyboard: Combining Skills in Reinforcement Learning , 2021, NeurIPS.

[39] Doina Precup,et al. Learning Options with Interest Functions , 2019, AAAI.

[40] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[41] Nahum Shimkin,et al. Unified Inter and Intra Options Learning Using Policy Gradient Methods , 2011, EWRL.

[42] R Bellman,et al. On the Theory of Dynamic Programming. , 1952, Proceedings of the National Academy of Sciences of the United States of America.

[43] Sergey Levine,et al. Near-Optimal Representation Learning for Hierarchical Reinforcement Learning , 2018, ICLR.

[44] Shimon Whiteson,et al. QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[45] Ion Stoica,et al. Multi-Level Discovery of Deep Options , 2017, ArXiv.

[46] Jonathan P. How,et al. Learning for Decentralized Control of Multiagent Systems in Large, Partially-Observable Stochastic Environments , 2016, AAAI.

[47] Peter Stone,et al. The utility of temporal abstraction in reinforcement learning , 2008, AAMAS.

[48] Jan Peters,et al. Probabilistic inference for determining options in reinforcement learning , 2016, Machine Learning.

[49] Karol Hausman,et al. Learning an Embedding Space for Transferable Robot Skills , 2018, ICLR.

[50] Ilya Kostrikov,et al. Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play , 2017, ICLR.

[51] Haim Kaplan,et al. Planning in Hierarchical Reinforcement Learning: Guarantees for Using Local Policies , 2019, ALT.

[52] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..