论文信息 - Bottom-Up Skill Discovery From Unsegmented Demonstrations for Long-Horizon Robot Manipulation

Bottom-Up Skill Discovery From Unsegmented Demonstrations for Long-Horizon Robot Manipulation

We tackle real-world long-horizon robot manipulation tasks through skill discovery. We present a bottomup approach to learning a library of reusable skills from unsegmented demonstrations and use these skills to synthesize prolonged robot behaviors. Our method starts with constructing a hierarchical task structure from each demonstration through agglomerative clustering. From the task structures of multitask demonstrations, we identify skills based on the recurring patterns and train goal-conditioned sensorimotor policies with hierarchical imitation learning. Finally, we train a meta controller to compose these skills to solve long-horizon manipulation tasks. The entire model can be trained on a small set of human demonstrations collected within 30 minutes without further annotations, making it amendable to real-world deployment. We systematically evaluated our method in simulation environments and on a real robot. Our method has shown superior performance over state-of-the-art imitation learning methods in multi-stage manipulation tasks. Furthermore, skills discovered from multitask demonstrations boost the average task success by 8% compared to those discovered from individual tasks.

[1] Byron Boots,et al. IRIS: Implicit Reinforcement without Interaction at Scale for Learning Control from Offline Robot Manipulation Data , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[2] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[3] Tom Schaul,et al. FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.

[4] Daan Wierstra,et al. Variational Intrinsic Control , 2016, ICLR.

[5] Doina Precup,et al. Temporal abstraction in reinforcement learning , 2000, ICML 2000.

[6] Ken Goldberg,et al. Deep Imitation Learning for Complex Manipulation Tasks from Virtual Reality Teleoperation , 2017, ICRA.

[7] Sehoon Ha,et al. Expanding Motor Skills using Relay Networks , 2018, CoRL.

[8] Silvio Savarese,et al. Making Sense of Vision and Touch: Learning Multimodal Representations for Contact-Rich Tasks , 2019, IEEE Transactions on Robotics.

[9] J. Gibson,et al. The Senses Considered As Perceptual Systems , 1967 .

[10] George Konidaris,et al. Option Discovery using Deep Skill Chaining , 2020, ICLR.

[11] Sergey Levine,et al. Relay Policy Learning: Solving Long-Horizon Tasks via Imitation and Reinforcement Learning , 2019, CoRL.

[12] Sergey Levine,et al. One-Shot Hierarchical Imitation Learning of Compound Visuomotor Tasks , 2018, ArXiv.

[13] Ronald M. Summers,et al. A Bottom-Up Approach for Pancreas Segmentation Using Cascaded Superpixels and (Deep) Image Patch Labeling , 2015, IEEE Transactions on Image Processing.

[14] Silvio Savarese,et al. Learning Multi-Arm Manipulation Through Collaborative Teleoperation , 2020, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[15] Ion Stoica,et al. Multi-Level Discovery of Deep Options , 2017, ArXiv.

[16] Lea Fleischer,et al. The Senses Considered As Perceptual Systems , 2016 .

[17] Gaurav S. Sukhatme,et al. Learning Manipulation Graphs from Demonstrations Using Multimodal Sensory Signals , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[18] Karol Hausman,et al. Learning an Embedding Space for Transferable Robot Skills , 2018, ICLR.

[19] Hei Law,et al. CornerNet: Detecting Objects as Paired Keypoints , 2018, ECCV.

[20] Scott Niekum,et al. Learning and generalization of complex tasks from unstructured demonstrations , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[21] Silvio Savarese,et al. Action Recognition by Hierarchical Mid-Level Action Elements , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[22] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[23] Andrea L. Thomaz,et al. Real-time Multisensory Affordance-based Control for Adaptive Object Manipulation , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[24] Jan Peters,et al. SKID RAW: Skill Discovery From Raw Trajectories , 2021, IEEE Robotics and Automation Letters.

[25] Pushmeet Kohli,et al. Compositional Imitation Learning: Explaining and executing one task at a time , 2018, ArXiv.

[26] Roberto Mart'in-Mart'in,et al. robosuite: A Modular Simulation Framework and Benchmark for Robot Learning , 2020, ArXiv.

[27] Shimon Whiteson,et al. TACO: Learning Task Decomposition via Temporal Alignment for Control , 2018, ICML.

[28] Silvio Savarese,et al. Neural Task Programming: Learning to Generalize Across Hierarchical Tasks , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[29] Sergey Levine,et al. AVID: Learning Multi-Stage Tasks via Pixel-Level Translation of Human Videos , 2020, Robotics: Science and Systems.

[30] Luc Van Gool,et al. Temporally-Weighted Hierarchical Clustering for Unsupervised Action Segmentation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Jia Deng,et al. Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[32] Geir Hovland,et al. Skill acquisition from human demonstration using a hidden Markov model , 1996, Proceedings of IEEE International Conference on Robotics and Automation.

[33] Scott Kuindersma,et al. Constructing Skill Trees for Reinforcement Learning Agents from Demonstration Trajectories , 2010, NIPS.

[34] Silvio Savarese,et al. Learning to Generalize Across Long-Horizon Tasks from Human Demonstrations , 2020, Robotics: Science and Systems.

[35] Leslie Pack Kaelbling,et al. Hierarchical task and motion planning in the now , 2011, 2011 IEEE International Conference on Robotics and Automation.

[36] Geoffrey E. Hinton. Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[37] Sergey Levine,et al. Dynamics-Aware Unsupervised Discovery of Skills , 2019, ICLR.

[38] Gregory D. Hager,et al. Transition state clustering: Unsupervised surgical trajectory segmentation for robot learning , 2017, ISRR.

[39] Sergey Levine,et al. Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.

[40] Andrew G. Barto,et al. Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining , 2009, NIPS.

[41] Oussama Khatib,et al. A unified approach for motion and force control of robot manipulators: The operational space formulation , 1987, IEEE J. Robotics Autom..

[42] Xingyi Zhou,et al. Bottom-Up Object Detection by Grouping Extreme and Center Points , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43] Rodney A. Brooks,et al. A Robust Layered Control Syste For A Mobile Robot , 2022 .

[44] Jeffrey M. Zacks,et al. Event structure in perception and conception. , 2001, Psychological bulletin.

[45] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.

[46] Ulrike von Luxburg,et al. A tutorial on spectral clustering , 2007, Stat. Comput..

[47] Thomas G. Dietterich. The MAXQ Method for Hierarchical Reinforcement Learning , 1998, ICML.

[48] Karol Hausman,et al. Modeling Long-horizon Tasks as Sequential Interaction Landscapes , 2020, CoRL.

[49] Christopher G. Atkeson,et al. Online Bayesian changepoint detection for articulated motion models , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[50] Abhinav Gupta,et al. Learning Robot Skills with Temporal Variational Inference , 2020, ICML.

[51] Tom Schaul,et al. Universal Value Function Approximators , 2015, ICML.

[52] M. Saquib Sarfraz,et al. Efficient Parameter-Free Clustering Using First Neighbor Relations , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[53] R. A. Brooks,et al. Intelligence without Representation , 1991, Artif. Intell..

[54] Leslie Pack Kaelbling,et al. Integrated task and motion planning in belief space , 2013, Int. J. Robotics Res..

[55] Marcin Andrychowicz,et al. One-Shot Imitation Learning , 2017, NIPS.

[56] Nan Jiang,et al. Hierarchical Imitation and Reinforcement Learning , 2018, ICML.

[57] Leslie Pack Kaelbling,et al. Integrated Task and Motion Planning , 2020, Annu. Rev. Control. Robotics Auton. Syst..