Bottom-Up Skill Discovery From Unsegmented Demonstrations for Long-Horizon Robot Manipulation

We tackle real-world long-horizon robot manipulation tasks through skill discovery. We present a bottomup approach to learning a library of reusable skills from unsegmented demonstrations and use these skills to synthesize prolonged robot behaviors. Our method starts with constructing a hierarchical task structure from each demonstration through agglomerative clustering. From the task structures of multitask demonstrations, we identify skills based on the recurring patterns and train goal-conditioned sensorimotor policies with hierarchical imitation learning. Finally, we train a meta controller to compose these skills to solve long-horizon manipulation tasks. The entire model can be trained on a small set of human demonstrations collected within 30 minutes without further annotations, making it amendable to real-world deployment. We systematically evaluated our method in simulation environments and on a real robot. Our method has shown superior performance over state-of-the-art imitation learning methods in multi-stage manipulation tasks. Furthermore, skills discovered from multitask demonstrations boost the average task success by 8% compared to those discovered from individual tasks.

[1]  Byron Boots,et al.  IRIS: Implicit Reinforcement without Interaction at Scale for Learning Control from Offline Robot Manipulation Data , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[2]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[3]  Tom Schaul,et al.  FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.

[4]  Daan Wierstra,et al.  Variational Intrinsic Control , 2016, ICLR.

[5]  Doina Precup,et al.  Temporal abstraction in reinforcement learning , 2000, ICML 2000.

[6]  Ken Goldberg,et al.  Deep Imitation Learning for Complex Manipulation Tasks from Virtual Reality Teleoperation , 2017, ICRA.

[7]  Sehoon Ha,et al.  Expanding Motor Skills using Relay Networks , 2018, CoRL.

[8]  Silvio Savarese,et al.  Making Sense of Vision and Touch: Learning Multimodal Representations for Contact-Rich Tasks , 2019, IEEE Transactions on Robotics.

[9]  J. Gibson,et al.  The Senses Considered As Perceptual Systems , 1967 .

[10]  George Konidaris,et al.  Option Discovery using Deep Skill Chaining , 2020, ICLR.

[11]  Sergey Levine,et al.  Relay Policy Learning: Solving Long-Horizon Tasks via Imitation and Reinforcement Learning , 2019, CoRL.

[12]  Sergey Levine,et al.  One-Shot Hierarchical Imitation Learning of Compound Visuomotor Tasks , 2018, ArXiv.

[13]  Ronald M. Summers,et al.  A Bottom-Up Approach for Pancreas Segmentation Using Cascaded Superpixels and (Deep) Image Patch Labeling , 2015, IEEE Transactions on Image Processing.

[14]  Silvio Savarese,et al.  Learning Multi-Arm Manipulation Through Collaborative Teleoperation , 2020, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[15]  Ion Stoica,et al.  Multi-Level Discovery of Deep Options , 2017, ArXiv.

[16]  Lea Fleischer,et al.  The Senses Considered As Perceptual Systems , 2016 .

[17]  Gaurav S. Sukhatme,et al.  Learning Manipulation Graphs from Demonstrations Using Multimodal Sensory Signals , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[18]  Karol Hausman,et al.  Learning an Embedding Space for Transferable Robot Skills , 2018, ICLR.

[19]  Hei Law,et al.  CornerNet: Detecting Objects as Paired Keypoints , 2018, ECCV.

[20]  Scott Niekum,et al.  Learning and generalization of complex tasks from unstructured demonstrations , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[21]  Silvio Savarese,et al.  Action Recognition by Hierarchical Mid-Level Action Elements , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[22]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[23]  Andrea L. Thomaz,et al.  Real-time Multisensory Affordance-based Control for Adaptive Object Manipulation , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[24]  Jan Peters,et al.  SKID RAW: Skill Discovery From Raw Trajectories , 2021, IEEE Robotics and Automation Letters.

[25]  Pushmeet Kohli,et al.  Compositional Imitation Learning: Explaining and executing one task at a time , 2018, ArXiv.

[26]  Roberto Mart'in-Mart'in,et al.  robosuite: A Modular Simulation Framework and Benchmark for Robot Learning , 2020, ArXiv.

[27]  Shimon Whiteson,et al.  TACO: Learning Task Decomposition via Temporal Alignment for Control , 2018, ICML.

[28]  Silvio Savarese,et al.  Neural Task Programming: Learning to Generalize Across Hierarchical Tasks , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[29]  Sergey Levine,et al.  AVID: Learning Multi-Stage Tasks via Pixel-Level Translation of Human Videos , 2020, Robotics: Science and Systems.

[30]  Luc Van Gool,et al.  Temporally-Weighted Hierarchical Clustering for Unsupervised Action Segmentation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[32]  Geir Hovland,et al.  Skill acquisition from human demonstration using a hidden Markov model , 1996, Proceedings of IEEE International Conference on Robotics and Automation.

[33]  Scott Kuindersma,et al.  Constructing Skill Trees for Reinforcement Learning Agents from Demonstration Trajectories , 2010, NIPS.

[34]  Silvio Savarese,et al.  Learning to Generalize Across Long-Horizon Tasks from Human Demonstrations , 2020, Robotics: Science and Systems.

[35]  Leslie Pack Kaelbling,et al.  Hierarchical task and motion planning in the now , 2011, 2011 IEEE International Conference on Robotics and Automation.

[36]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[37]  Sergey Levine,et al.  Dynamics-Aware Unsupervised Discovery of Skills , 2019, ICLR.

[38]  Gregory D. Hager,et al.  Transition state clustering: Unsupervised surgical trajectory segmentation for robot learning , 2017, ISRR.

[39]  Sergey Levine,et al.  Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.

[40]  Andrew G. Barto,et al.  Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining , 2009, NIPS.

[41]  Oussama Khatib,et al.  A unified approach for motion and force control of robot manipulators: The operational space formulation , 1987, IEEE J. Robotics Autom..

[42]  Xingyi Zhou,et al.  Bottom-Up Object Detection by Grouping Extreme and Center Points , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Rodney A. Brooks,et al.  A Robust Layered Control Syste For A Mobile Robot , 2022 .

[44]  Jeffrey M. Zacks,et al.  Event structure in perception and conception. , 2001, Psychological bulletin.

[45]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[46]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[47]  Thomas G. Dietterich The MAXQ Method for Hierarchical Reinforcement Learning , 1998, ICML.

[48]  Karol Hausman,et al.  Modeling Long-horizon Tasks as Sequential Interaction Landscapes , 2020, CoRL.

[49]  Christopher G. Atkeson,et al.  Online Bayesian changepoint detection for articulated motion models , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[50]  Abhinav Gupta,et al.  Learning Robot Skills with Temporal Variational Inference , 2020, ICML.

[51]  Tom Schaul,et al.  Universal Value Function Approximators , 2015, ICML.

[52]  M. Saquib Sarfraz,et al.  Efficient Parameter-Free Clustering Using First Neighbor Relations , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  R. A. Brooks,et al.  Intelligence without Representation , 1991, Artif. Intell..

[54]  Leslie Pack Kaelbling,et al.  Integrated task and motion planning in belief space , 2013, Int. J. Robotics Res..

[55]  Marcin Andrychowicz,et al.  One-Shot Imitation Learning , 2017, NIPS.

[56]  Nan Jiang,et al.  Hierarchical Imitation and Reinforcement Learning , 2018, ICML.

[57]  Leslie Pack Kaelbling,et al.  Integrated Task and Motion Planning , 2020, Annu. Rev. Control. Robotics Auton. Syst..