论文信息 - Hierarchical Skills for Efficient Exploration

Hierarchical Skills for Efficient Exploration

In reinforcement learning, pre-trained low-level skills have the potential to greatly facilitate exploration. However, prior knowledge of the downstream task is required to strike the right balance between generality (fine-grained control) and specificity (faster learning) in skill design. In previous work on continuous control, the sensitivity of methods to this trade-off has not been addressed explicitly, as locomotion provides a suitable prior for navigation tasks, which have been of foremost interest. In this work, we analyze this trade-off for low-level policy pre-training with a new benchmark suite of diverse, sparse-reward tasks for bipedal robots. We alleviate the need for prior knowledge by proposing a hierarchical skill learning framework that acquires skills of varying complexity in an unsupervised manner. For utilization on downstream tasks, we present a three-layered hierarchical learning algorithm to automatically trade off between general and specific skills as required by the respective task. In our experiments, we show that our approach performs this trade-off effectively and achieves better results than current state-of-the-art methods for endto-end hierarchical reinforcement learning and unsupervised skill discovery. Code and videos are available at https://facebookresearch.github.io/hsd3.

[1] Drew Wicke,et al. Hierarchical Approaches for Reinforcement Learning in Parameterized Action Space , 2018, AAAI Spring Symposia.

[2] Abhinav Gupta,et al. Hierarchical RL Using an Ensemble of Proprioceptive Periodic Policies , 2019, ICLR.

[3] Daan Wierstra,et al. Variational Intrinsic Control , 2016, ICLR.

[4] Sergey Levine,et al. Dynamics-Aware Unsupervised Discovery of Skills , 2019, ICLR.

[5] Przemyslaw Dobrowolski,et al. Swing-twist decomposition in Clifford algebra , 2015, ArXiv.

[6] Yee Whye Teh,et al. Neural probabilistic motor primitives for humanoid control , 2018, ICLR.

[7] Henry Zhu,et al. Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.

[8] Abhinav Gupta,et al. Dynamics-aware Embeddings , 2019, ICLR.

[9] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[10] Abhinav Gupta,et al. Learning Robot Skills with Temporal Variational Inference , 2020, ICML.

[11] Tom Schaul,et al. Universal Value Function Approximators , 2015, ICML.

[12] Pieter Abbeel,et al. Stochastic Neural Networks for Hierarchical Reinforcement Learning , 2016, ICLR.

[13] Moses Charikar,et al. Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[14] Yuval Tassa,et al. Learning human behaviors from motion capture by adversarial imitation , 2017, ArXiv.

[15] Yuandong Tian,et al. Planning in Learned Latent Action Spaces for Generalizable Legged Locomotion , 2020, ArXiv.

[16] Sergey Levine,et al. Why Does Hierarchy (Sometimes) Work So Well in Reinforcement Learning? , 2019, ArXiv.

[17] Yuval Tassa,et al. dm_control: Software and Tasks for Continuous Control , 2020, Softw. Impacts.

[18] N. Heess,et al. Catch & Carry: Reusable Neural Controllers for Vision-Guided Whole-Body Tasks , 2019 .

[19] Pravesh Ranchod,et al. Reinforcement Learning with Parameterized Actions , 2015, AAAI.

[20] Yuval Tassa,et al. Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.

[21] Doina Precup,et al. The Option-Critic Architecture , 2016, AAAI.

[22] Earl D. Sacerdoti,et al. Planning in a Hierarchy of Abstraction Spaces , 1974, IJCAI.

[23] Pieter Abbeel,et al. Sub-policy Adaptation for Hierarchical Reinforcement Learning , 2019, ICLR.

[24] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[25] Sergey Levine,et al. Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.

[26] Richard Socher,et al. Explore, Discover and Learn: Unsupervised Discovery of State-Covering Skills , 2020, ICML.

[27] Animesh Garg,et al. D2RL: Deep Dense Architectures in Reinforcement Learning , 2020, ArXiv.

[28] Sergey Levine,et al. OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement Learning , 2021, ICLR.

[29] Wei Xu,et al. Hierarchical Reinforcement Learning By Discovering Intrinsic Options , 2021, ICLR.

[30] Andrew G. Barto,et al. Building Portable Options: Skill Transfer in Reinforcement Learning , 2007, IJCAI.

[31] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[32] Sergey Levine,et al. MCP: Learning Composable Hierarchical Control with Multiplicative Compositional Policies , 2019, NeurIPS.

[33] Richard Fikes,et al. Learning and Executing Generalized Robot Plans , 1993, Artif. Intell..

[34] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.

[35] Nicolas Heess,et al. Hierarchical visuomotor control of humanoids , 2018, ICLR.

[36] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[37] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[38] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[39] Sergey Levine,et al. Near-Optimal Representation Learning for Hierarchical Reinforcement Learning , 2018, ICLR.

[40] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[41] Filip De Turck,et al. #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning , 2016, NIPS.

[42] Sergey Levine,et al. Data-Efficient Hierarchical Reinforcement Learning , 2018, NeurIPS.

[43] Peter Stone,et al. The utility of temporal abstraction in reinforcement learning , 2008, AAMAS.

[44] Karol Hausman,et al. Learning an Embedding Space for Transferable Robot Skills , 2018, ICLR.

[45] Tom Schaul,et al. FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.

[46] Pieter Abbeel,et al. Variational Option Discovery Algorithms , 2018, ArXiv.

[47] Eloi Alonso,et al. Discrete and Continuous Action Representation for Practical RL in Video Games , 2019, ArXiv.

[48] Byron Boots,et al. Composing Task-Agnostic Policies with Deep Reinforcement Learning , 2020, ICLR.

[49] Marlos C. Machado,et al. A Laplacian Framework for Option Discovery in Reinforcement Learning , 2017, ICML.

[50] Martin A. Riedmiller,et al. Learning by Playing - Solving Sparse Reward Tasks from Scratch , 2018, ICML.

[51] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.

[52] Sergey Levine,et al. Divide-and-Conquer Reinforcement Learning , 2017, ICLR.

[53] Joseph J. Lim,et al. Accelerating Reinforcement Learning with Learned Skill Priors , 2020, CoRL.