Hierarchical Reinforcement Learning By Discovering Intrinsic Options

We propose a hierarchical reinforcement learning method, HIDIO, that can learn task-agnostic options in a self-supervised manner while jointly learning to utilize them to solve sparse-reward tasks. Unlike current hierarchical RL approaches that tend to formulate goal-reaching low-level tasks or pre-define ad hoc lowerlevel policies, HIDIO encourages lower-level option learning that is independent of the task at hand, requiring few assumptions or little knowledge about the task structure. These options are learned through an intrinsic entropy minimization objective conditioned on the option sub-trajectories. The learned options are diverse and task-agnostic. In experiments on sparse-reward robotic manipulation and navigation tasks, HIDIO achieves higher success rates with greater sample efficiency than regular RL baselines and two state-of-the-art hierarchical RL methods. Code available at https://www.github.com/jesbu1/hidio.

[1]  Tze-Yun Leong,et al.  An Efficient Approach to Model-Based Hierarchical Reinforcement Learning , 2017, AAAI.

[2]  Doina Precup,et al.  Learnings Options End-to-End for Continuous Action Tasks , 2017, ArXiv.

[3]  David C. Noelle,et al.  Learning Representations in Model-Free Hierarchical Reinforcement Learning , 2018, AAAI.

[4]  Sridhar Mahadevan,et al.  Hierarchical Policy Gradient Algorithms , 2003, ICML.

[5]  Sergey Levine,et al.  Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.

[6]  Kate Saenko,et al.  Learning Multi-Level Hierarchies with Hindsight , 2017, ICLR.

[7]  Gerald Tesauro,et al.  Learning Abstract Options , 2018, NeurIPS.

[8]  Yoshua Bengio,et al.  Revisiting Fundamentals of Experience Replay , 2020, ICML.

[9]  Doina Precup,et al.  Safe option-critic: learning safety in the option-critic architecture , 2018, The Knowledge Engineering Review.

[10]  Marie desJardins,et al.  Planning with Abstract Learned Models While Learning Transferable Subtasks , 2020, AAAI.

[11]  Scott Niekum,et al.  Hypothesis-Driven Skill Discovery for Hierarchical Deep Reinforcement Learning , 2019, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[12]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[13]  Arjun Chandra,et al.  Efficient Parallel Methods for Deep Reinforcement Learning , 2017, ArXiv.

[14]  Pieter Abbeel,et al.  Variational Option Discovery Algorithms , 2018, ArXiv.

[15]  Lihong Li,et al.  PAC-inspired Option Discovery in Lifelong Reinforcement Learning , 2014, ICML.

[16]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[17]  Daan Wierstra,et al.  Variational Intrinsic Control , 2016, ICLR.

[18]  Marie desJardins,et al.  Portable Option Discovery for Automated Learning Transfer in Object-Oriented Markov Decision Processes , 2015, IJCAI.

[19]  Pieter Abbeel,et al.  Stochastic Neural Networks for Hierarchical Reinforcement Learning , 2016, ICLR.

[20]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[21]  Tom Schaul,et al.  FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.

[22]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[23]  Mohit Sharma,et al.  Directed-Info GAIL: Learning Hierarchical Policies from Unsegmented Demonstrations using Directed Information , 2018, ICLR.

[24]  Joseph J. Lim,et al.  Composing Complex Skills by Learning Transition Policies , 2018, ICLR.

[25]  Doina Precup,et al.  The Option-Critic Architecture , 2016, AAAI.

[26]  Honglak Lee,et al.  Hierarchical Reinforcement Learning for Zero-shot Generalization with Subtask Dependencies , 2018, NeurIPS.

[27]  David Barber,et al.  The IM algorithm: a variational approach to Information Maximization , 2003, NIPS 2003.

[28]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[29]  Allan Jabri,et al.  Unsupervised Curricula for Visual Meta-Reinforcement Learning , 2019, NeurIPS.

[30]  Sergey Levine,et al.  Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.

[31]  David Warde-Farley,et al.  Unsupervised Control Through Non-Parametric Discriminative Rewards , 2018, ICLR.

[32]  Nan Jiang,et al.  Hierarchical Imitation and Reinforcement Learning , 2018, ICML.

[33]  Sergey Levine,et al.  Dynamics-Aware Unsupervised Discovery of Skills , 2019, ICLR.

[34]  Sergey Levine,et al.  Learning Latent Plans from Play , 2019, CoRL.

[35]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[36]  Thomas G. Dietterich,et al.  Automatic discovery and transfer of MAXQ hierarchies , 2008, ICML '08.

[37]  Martin A. Riedmiller,et al.  Learning by Playing - Solving Sparse Reward Tasks from Scratch , 2018, ICML.

[38]  Philip S. Thomas,et al.  Natural Option Critic , 2019, AAAI.

[39]  Sergey Levine,et al.  Data-Efficient Hierarchical Reinforcement Learning , 2018, NeurIPS.

[40]  Karol Hausman,et al.  Learning an Embedding Space for Transferable Robot Skills , 2018, ICLR.

[41]  Joseph J. Lim,et al.  Learning to Coordinate Manipulation Skills via Skill Behavior Diversification , 2020, ICLR.

[42]  George Konidaris,et al.  Option Discovery using Deep Skill Chaining , 2020, ICLR.

[43]  Nicolas Heess,et al.  Hierarchical visuomotor control of humanoids , 2018, ICLR.

[44]  Pravesh Ranchod,et al.  Nonparametric Bayesian reward segmentation for skill discovery using inverse reinforcement learning , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[45]  Pieter Abbeel,et al.  Sub-policy Adaptation for Hierarchical Reinforcement Learning , 2019, ICLR.