Hierarchically Decoupled Imitation for Morphological Transfer

Learning long-range behaviors on complex high-dimensional agents is a fundamental problem in robot learning. For such tasks, we argue that transferring learned information from a morphologically simpler agent can massively improve the sample efficiency of a more complex one. To this end, we propose a hierarchical decoupling of policies into two parts: an independently learned low-level policy and a transferable high-level policy. To remedy poor transfer performance due to mismatch in morphologies, we contribute two key ideas. First, we show that incentivizing a complex agent's low-level to imitate a simpler agent's low-level significantly improves zero-shot high-level transfer. Second, we show that KL-regularized training of the high level stabilizes learning and prevents mode-collapse. Finally, on a suite of publicly released navigation and manipulation environments, we demonstrate the applicability of hierarchical transfer on long-range tasks across morphologies. Our code and videos can be found at this https URL.

[1]  Siyuan Li,et al.  Hierarchical Reinforcement Learning with Advantage-Based Auxiliary Rewards , 2019, NeurIPS.

[2]  Claude Sammut,et al.  Behavioural cloning in control of a dynamic system , 1995, 1995 IEEE International Conference on Systems, Man and Cybernetics. Intelligent Systems for the 21st Century.

[3]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[4]  David Barber,et al.  Information Maximization in Noisy Channels : A Variational Approach , 2003, NIPS.

[5]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[6]  Yuval Tassa,et al.  DeepMind Control Suite , 2018, ArXiv.

[7]  Abhinav Gupta,et al.  Discovering Motor Programs by Recomposing Demonstrations , 2020, ICLR.

[8]  Eric Eaton,et al.  Unsupervised Cross-Domain Transfer in Policy Gradient Reinforcement Learning via Manifold Alignment , 2015, AAAI.

[9]  Sergey Levine,et al.  Dynamics-Aware Unsupervised Discovery of Skills , 2019, ICLR.

[10]  Abhinav Gupta,et al.  Hierarchical RL Using an Ensemble of Proprioceptive Periodic Policies , 2019, ICLR.

[11]  Dan Klein,et al.  Modular Multitask Reinforcement Learning with Policy Sketches , 2016, ICML.

[12]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[13]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[14]  Yee Whye Teh,et al.  Exploiting Hierarchy for Learning and Transfer in KL-regularized RL , 2019, ArXiv.

[15]  Trevor Darrell,et al.  Adapting Visual Category Models to New Domains , 2010, ECCV.

[16]  Trevor Darrell,et al.  TSC-DL: Unsupervised trajectory segmentation of multi-modal surgical demonstrations with Deep Learning , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[17]  Sham M. Kakade,et al.  A Natural Policy Gradient , 2001, NIPS.

[18]  D M Wolpert,et al.  Multiple paired forward and inverse models for motor control , 1998, Neural Networks.

[19]  Igor Mordatch,et al.  Emergent Tool Use From Multi-Agent Autocurricula , 2019, ICLR.

[20]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[21]  Pieter Abbeel,et al.  Stochastic Neural Networks for Hierarchical Reinforcement Learning , 2016, ICLR.

[22]  Yang Hu,et al.  Skill Transfer in Deep Reinforcement Learning under Morphological Heterogeneity , 2019, ArXiv.

[23]  Oliver Kroemer,et al.  Towards Robot Skill Learning: From Simple Skills to Table Tennis , 2013, ECML/PKDD.

[24]  Gregory D. Hager,et al.  Transition state clustering: Unsupervised surgical trajectory segmentation for robot learning , 2017, ISRR.

[25]  Andrew Zisserman,et al.  Multi-task Self-Supervised Visual Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[26]  Sergey Levine,et al.  InfoBot: Transfer and Exploration via the Information Bottleneck , 2019, ICLR.

[27]  Dileep George,et al.  Schema Networks: Zero-shot Transfer with a Generative Causal Model of Intuitive Physics , 2017, ICML.

[28]  Andrew Zisserman,et al.  Tabula rasa: Model transfer for object category detection , 2011, 2011 International Conference on Computer Vision.

[29]  Dong Liu,et al.  Robust visual domain adaptation with low-rank reconstruction , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Trevor Darrell,et al.  Adversarial Discriminative Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[32]  Deepak Pathak,et al.  Third-Person Visual Imitation Learning via Decoupled Hierarchical Controller , 2019, NeurIPS.

[33]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[34]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Abhinav Gupta,et al.  CASSL: Curriculum Accelerated Self-Supervised Learning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[36]  Kate Saenko,et al.  Learning Multi-Level Hierarchies with Hindsight , 2017, ICLR.

[37]  Tom Schaul,et al.  Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[38]  Alexei A. Efros,et al.  Learning to Control Self-Assembling Morphologies: A Study of Generalization via Modularity , 2019, NeurIPS.

[39]  Monica N. Nicolescu,et al.  Natural methods for robot task learning: instructive demonstrations, generalization and practice , 2003, AAMAS '03.

[40]  Jonathan P. How,et al.  Deep Decentralized Multi-task Multi-Agent Reinforcement Learning under Partial Observability , 2017, ICML.

[41]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[42]  Jan Peters,et al.  Learning motor primitives for robotics , 2009, 2009 IEEE International Conference on Robotics and Automation.

[43]  Trevor Darrell,et al.  What you saw is not what you get: Domain adaptation using asymmetric kernel transforms , 2011, CVPR 2011.

[44]  Tom Schaul,et al.  Successor Features for Transfer in Reinforcement Learning , 2016, NIPS.

[45]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[46]  Glen Berseth,et al.  DeepLoco , 2017, ACM Trans. Graph..

[47]  Sergey Levine,et al.  Near-Optimal Representation Learning for Hierarchical Reinforcement Learning , 2018, ICLR.

[48]  Abhinav Gupta,et al.  Learning to push by grasping: Using multiple tasks for effective learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[49]  Scott Niekum,et al.  Learning and generalization of complex tasks from unstructured demonstrations , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[50]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[51]  Sergey Levine,et al.  Learning modular neural network policies for multi-task and multi-robot transfer , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[52]  Pieter Abbeel,et al.  Third-Person Imitation Learning , 2017, ICLR.

[53]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[54]  Pieter Abbeel,et al.  Sub-policy Adaptation for Hierarchical Reinforcement Learning , 2019, ICLR.

[55]  Doina Precup,et al.  The Option-Critic Architecture , 2016, AAAI.

[56]  Henry Zhu,et al.  Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.

[57]  MahadevanSridhar,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003 .

[58]  Stefano Ermon,et al.  InfoGAIL: Interpretable Imitation Learning from Visual Demonstrations , 2017, NIPS.

[59]  Tinne Tuytelaars,et al.  Unsupervised Visual Domain Adaptation Using Subspace Alignment , 2013, 2013 IEEE International Conference on Computer Vision.

[60]  Tao Chen,et al.  Hardware Conditioned Policies for Multi-Robot Transfer Learning , 2018, NeurIPS.

[61]  Gavriel Salomon,et al.  T RANSFER OF LEARNING , 1992 .

[62]  VelosoManuela,et al.  A survey of robot learning from demonstration , 2009 .

[63]  Joshua B. Tenenbaum,et al.  Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[64]  Rong Yan,et al.  Adapting SVM Classifiers to Data with Shifted Distributions , 2007, Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007).

[65]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[66]  Rama Chellappa,et al.  Domain adaptation for object recognition: An unsupervised approach , 2011, 2011 International Conference on Computer Vision.

[67]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[68]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[69]  Stefan Schaal,et al.  Movement segmentation using a primitive library , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[70]  Iasonas Kokkinos,et al.  UberNet: Training a Universal Convolutional Neural Network for Low-, Mid-, and High-Level Vision Using Diverse Datasets and Limited Memory , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[71]  Trevor Darrell,et al.  Continuous Manifold Based Adaptation for Evolving Visual Domains , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[72]  Sergey Levine,et al.  Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations , 2017, Robotics: Science and Systems.

[73]  Sergey Levine,et al.  Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.

[74]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[75]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[76]  Sergey Levine,et al.  MCP: Learning Composable Hierarchical Control with Multiplicative Compositional Policies , 2019, NeurIPS.

[77]  Yee Whye Teh,et al.  Information asymmetry in KL-regularized RL , 2019, ICLR.

[78]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[79]  Pieter Abbeel,et al.  Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[80]  Ruslan Salakhutdinov,et al.  Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.