Hierarchical Reinforcement Learning By Discovering Intrinsic Options

We propose a hierarchical reinforcement learning method, HIDIO, that can learn task-agnostic options in a self-supervised manner while jointly learning to utilize them to solve sparse-reward tasks. Unlike current hierarchical RL approaches that tend to formulate goal-reaching low-level tasks or pre-define ad hoc lowerlevel policies, HIDIO encourages lower-level option learning that is independent of the task at hand, requiring few assumptions or little knowledge about the task structure. These options are learned through an intrinsic entropy minimization objective conditioned on the option sub-trajectories. The learned options are diverse and task-agnostic. In experiments on sparse-reward robotic manipulation and navigation tasks, HIDIO achieves higher success rates with greater sample efficiency than regular RL baselines and two state-of-the-art hierarchical RL methods. Code available at https://www.github.com/jesbu1/hidio.

[1]  Allan Jabri,et al.  Unsupervised Curricula for Visual Meta-Reinforcement Learning , 2019, NeurIPS.

[2]  George Konidaris,et al.  Option Discovery using Deep Skill Chaining , 2020, ICLR.

[3]  Joseph J. Lim,et al.  Composing Complex Skills by Learning Transition Policies , 2018, ICLR.

[4]  Pieter Abbeel,et al.  Stochastic Neural Networks for Hierarchical Reinforcement Learning , 2016, ICLR.

[5]  Marie desJardins,et al.  Portable Option Discovery for Automated Learning Transfer in Object-Oriented Markov Decision Processes , 2015, IJCAI.

[6]  Thomas G. Dietterich,et al.  Automatic discovery and transfer of MAXQ hierarchies , 2008, ICML '08.

[7]  Gerald Tesauro,et al.  Learning Abstract Options , 2018, NeurIPS.

[8]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[9]  Mohit Sharma,et al.  Directed-Info GAIL: Learning Hierarchical Policies from Unsegmented Demonstrations using Directed Information , 2018, ICLR.

[10]  Joseph J. Lim,et al.  Learning to Coordinate Manipulation Skills via Skill Behavior Diversification , 2020, ICLR.

[11]  Sergey Levine,et al.  Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.

[12]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[13]  Sergey Levine,et al.  Learning Latent Plans from Play , 2019, CoRL.

[14]  Scott Niekum,et al.  Hypothesis-Driven Skill Discovery for Hierarchical Deep Reinforcement Learning , 2019, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[15]  Honglak Lee,et al.  Hierarchical Reinforcement Learning for Zero-shot Generalization with Subtask Dependencies , 2018, NeurIPS.

[16]  Doina Precup,et al.  The Option-Critic Architecture , 2016, AAAI.

[17]  Pieter Abbeel,et al.  Sub-policy Adaptation for Hierarchical Reinforcement Learning , 2019, ICLR.

[18]  Kate Saenko,et al.  Learning Multi-Level Hierarchies with Hindsight , 2017, ICLR.

[19]  Tom Schaul,et al.  FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.

[20]  Doina Precup,et al.  Learnings Options End-to-End for Continuous Action Tasks , 2017, ArXiv.

[21]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[22]  Lihong Li,et al.  PAC-inspired Option Discovery in Lifelong Reinforcement Learning , 2014, ICML.

[23]  Martin A. Riedmiller,et al.  Learning by Playing - Solving Sparse Reward Tasks from Scratch , 2018, ICML.

[24]  David Barber,et al.  The IM algorithm: a variational approach to Information Maximization , 2003, NIPS 2003.

[25]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[26]  Nicolas Heess,et al.  Hierarchical visuomotor control of humanoids , 2018, ICLR.

[27]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[28]  David C. Noelle,et al.  Learning Representations in Model-Free Hierarchical Reinforcement Learning , 2018, AAAI.

[29]  Sergey Levine,et al.  Data-Efficient Hierarchical Reinforcement Learning , 2018, NeurIPS.

[30]  Karol Hausman,et al.  Learning an Embedding Space for Transferable Robot Skills , 2018, ICLR.

[31]  Pravesh Ranchod,et al.  Nonparametric Bayesian reward segmentation for skill discovery using inverse reinforcement learning , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[32]  Nan Jiang,et al.  Hierarchical Imitation and Reinforcement Learning , 2018, ICML.

[33]  Sridhar Mahadevan,et al.  Hierarchical Policy Gradient Algorithms , 2003, ICML.

[34]  Doina Precup,et al.  Safe option-critic: learning safety in the option-critic architecture , 2018, The Knowledge Engineering Review.

[35]  Marie desJardins,et al.  Planning with Abstract Learned Models While Learning Transferable Subtasks , 2020, AAAI.

[36]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[37]  Yoshua Bengio,et al.  Revisiting Fundamentals of Experience Replay , 2020, ICML.

[38]  Pieter Abbeel,et al.  Variational Option Discovery Algorithms , 2018, ArXiv.

[39]  Daan Wierstra,et al.  Variational Intrinsic Control , 2016, ICLR.

[40]  Tze-Yun Leong,et al.  An Efficient Approach to Model-Based Hierarchical Reinforcement Learning , 2017, AAAI.

[41]  Arjun Chandra,et al.  Efficient Parallel Methods for Deep Reinforcement Learning , 2017, ArXiv.

[42]  Philip S. Thomas,et al.  Natural Option Critic , 2019, AAAI.

[43]  Sergey Levine,et al.  Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.

[44]  David Warde-Farley,et al.  Unsupervised Control Through Non-Parametric Discriminative Rewards , 2018, ICLR.

[45]  Sergey Levine,et al.  Dynamics-Aware Unsupervised Discovery of Skills , 2019, ICLR.