论文信息 - Distilling a Hierarchical Policy for Planning and Control via Representation and Reinforcement Learning

Distilling a Hierarchical Policy for Planning and Control via Representation and Reinforcement Learning

We present a hierarchical planning and control framework that enables an agent to perform various tasks and adapt to a new task flexibly. Rather than learning an individual policy for each particular task, the proposed framework, DISH, distills a hierarchical policy from a set of tasks by representation and reinforcement learning. The framework is based on the idea of latent variable models that represent high-dimensional observations using low-dimensional latent variables. The resulting policy consists of two levels of hierarchy: (i) a planning module that reasons a sequence of latent intentions that would lead to an optimistic future and (ii) a feedback control policy, shared across the tasks, that executes the inferred intention. Because the planning is performed in low-dimensional latent space, the learned policy can immediately be used to solve or adapt to new tasks without additional training. We demonstrate the proposed framework can learn compact representations (3- and 1-dimensional latent states and commands for a humanoid with 197- and 36-dimensional state features and actions) while solving a small number of imitation tasks, and the resulting policy is directly applicable to other types of tasks, i.e., navigation in cluttered environments.

Han-Lim Choi | Jung-Su Ha | Hyeok-Joo Chae | Soon-Seo Park | Young-Jin Park

[1] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[2] Sergey Levine,et al. Data-Efficient Hierarchical Reinforcement Learning , 2018, NeurIPS.

[3] Sergey Levine,et al. Model-Based Reinforcement Learning for Atari , 2019, ICLR.

[4] Zoubin Ghahramani,et al. Unsupervised learning of sensory-motor primitives , 2003, Proceedings of the 25th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (IEEE Cat. No.03CH37439).

[5] Pieter Abbeel,et al. Stochastic Neural Networks for Hierarchical Reinforcement Learning , 2016, ICLR.

[6] Pieter Abbeel,et al. Value Iteration Networks , 2016, NIPS.

[7] Han-Lim Choi,et al. Approximate Inference-Based Motion Planning by Learning and Exploiting Low-Dimensional Latent Variable Models , 2018, IEEE Robotics and Automation Letters.

[8] Sergey Levine,et al. Latent Space Policies for Hierarchical Reinforcement Learning , 2018, ICML.

[9] Emanuel Todorov,et al. General duality between optimal control and estimation , 2008, 2008 47th IEEE Conference on Decision and Control.

[10] Honglak Lee,et al. Learning Structured Output Representation using Deep Conditional Generative Models , 2015, NIPS.

[11] Sergey Levine,et al. Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review , 2018, ArXiv.

[12] Marc Toussaint,et al. Robot trajectory optimization using approximate inference , 2009, ICML '09.

[13] Carl E. Rasmussen,et al. Gaussian Processes for Data-Efficient Learning in Robotics and Control , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14] Sergey Levine,et al. Dynamics-Aware Unsupervised Discovery of Skills , 2019, ICLR.

[15] Marc Toussaint,et al. On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference , 2012, Robotics: Science and Systems.

[16] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.