One-shot Imitation Learning via Interaction Warping

Imitation learning of robot policies from few demonstrations is crucial in open-ended applications. We propose a new method, Interaction Warping, for learning SE(3) robotic manipulation policies from a single demonstration. We infer the 3D mesh of each object in the environment using shape warping, a technique for aligning point clouds across object instances. Then, we represent manipulation actions as keypoints on objects, which can be warped with the shape of the object. We show successful one-shot imitation learning on three simulated and real-world object re-arrangement tasks. We also demonstrate the ability of our method to predict object meshes and robot grasps in the wild.

[1]  Ross B. Girshick,et al.  Segment Anything , 2023, 2023 IEEE/CVF International Conference on Computer Vision (ICCV).

[2]  L. Kaelbling,et al.  Local Neural Descriptor Fields: Locally Conditioned Object Representations for Manipulation , 2023, 2023 IEEE International Conference on Robotics and Automation (ICRA).

[3]  L. Kaelbling,et al.  SE(3)-Equivariant Relational Rearrangement with Neural Descriptor Fields , 2022, CoRL.

[4]  Ben Eisner,et al.  TAX-Pose: Task-Specific Cross-Pose Estimation for Robot Manipulation , 2022, CoRL.

[5]  Jongeun Choi,et al.  Equivariant Descriptor Fields: SE(3)-Equivariant Energy-Based Models for End-to-End Visual Robotic Manipulation Learning , 2022, ICLR.

[6]  Thomas Kipf,et al.  Simple Open-Vocabulary Object Detection with Vision Transformers , 2022, ArXiv.

[7]  Jan-Willem van de Meent,et al.  Learning Symmetric Embeddings for Equivariant World Models , 2022, ICML.

[8]  Robert W. Platt,et al.  SO(2)-Equivariant Reinforcement Learning , 2022, ICLR.

[9]  Kostas E. Bekris,et al.  You Only Demonstrate Once: Category-Level Manipulation from Single Visual Demonstration , 2022, Robotics: Science and Systems.

[10]  Armand Joulin,et al.  Detecting Twenty-thousand Classes using Image-level Supervision , 2022, ECCV.

[11]  Vincent Sitzmann,et al.  Neural Descriptor Fields: SE(3)-Equivariant Object Representations for Manipulation , 2021, 2022 International Conference on Robotics and Automation (ICRA).

[12]  A. Schwing,et al.  Masked-attention Mask Transformer for Universal Image Segmentation , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Sven J. Dickinson,et al.  GIFT: Generalizable Interaction-aware Functional Tool Affordances without Labels , 2021, Robotics: Science and Systems.

[14]  Leslie Pack Kaelbling,et al.  Shape-Based Transfer of Generic Skills , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[15]  Xiaoshui Huang,et al.  A comprehensive survey on point cloud registration , 2021, ArXiv.

[16]  Wei Gao,et al.  kPAM 2.0: Feedback Control for Category-Level Robotic Manipulation , 2021, IEEE Robotics and Automation Letters.

[17]  Oleg O. Sushkov,et al.  S3K: Self-Supervised Semantic Keypoints for Robotic Manipulation via Multi-View Consistency , 2020, CoRL.

[18]  Russ Tedrake,et al.  Keypoints into the Future: Self-Supervised Correspondence in Model-Based Reinforcement Learning , 2020, CoRL.

[19]  Silvio Savarese,et al.  KETO: Learning Keypoint Representations for Tool Manipulation , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[20]  Sven Behnke,et al.  Autonomous Bimanual Functional Regrasping of Novel Object Class Instances , 2019, 2019 IEEE-RAS 19th International Conference on Humanoid Robots (Humanoids).

[21]  Wei Gao,et al.  kPAM-SC: Generalizable Manipulation Planning using KeyPoint Affordance and Shape Completion , 2019, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[22]  Peter R. Florence,et al.  kPAM: KeyPoint Affordances for Category-Level Robotic Manipulation , 2019, ISRR.

[23]  Sven Behnke,et al.  Learning Postural Synergies for Categorical Grasping Through Shape Space Registration , 2018, 2018 IEEE-RAS 18th International Conference on Humanoid Robots (Humanoids).

[24]  Sven Behnke,et al.  Supervised Autonomous Locomotion and Manipulation for Disaster Response with a Centaur-Like Robot , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[25]  Dinesh Manocha,et al.  Transferring Grasp Configurations using Active Learning and Local Replanning , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[26]  Nicola De Cao,et al.  Explorations in Homeomorphic Variational Auto-Encoding , 2018, ArXiv.

[27]  Sven Behnke,et al.  Transferring Category-Based Functional Grasping Skills by Latent Space Non-Rigid Registration , 2018, IEEE Robotics and Automation Letters.

[28]  Russ Tedrake,et al.  Dense Object Nets: Learning Dense Visual Object Descriptors By and For Robotic Manipulation , 2018, CoRL.

[29]  Sven Behnke,et al.  Transferring Grasping Skills to Novel Instances by Latent Space Non-Rigid Registration , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[30]  Yue Wang,et al.  Dynamic Graph CNN for Learning on Point Clouds , 2018, ACM Trans. Graph..

[31]  Bolei Zhou,et al.  Scene Parsing through ADE20K Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[33]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[34]  Sergey Levine,et al.  Learning from multiple demonstrations using trajectory-aware non-rigid registration with applications to deformable object manipulation , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[35]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[36]  Máximo A. Roa,et al.  Functional power grasps transferred through warping and replanning , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[37]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[38]  Oliver Kroemer,et al.  Generalizing pouring actions between objects using warped parameters , 2014, 2014 IEEE-RAS International Conference on Humanoid Robots.

[39]  Máximo A. Roa,et al.  Transferring functional grasps through contact warping and local replanning , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[40]  Oliver Kroemer,et al.  Generalization of human grasping for multi-fingered robot hands , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[41]  Sven R. Schmidt-Rohr,et al.  Learning of Planning Models for Dexterous Manipulation Based on Human Demonstrations , 2012, Int. J. Soc. Robotics.

[42]  Andriy Myronenko,et al.  Point Set Registration: Coherent Point Drift , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Ying Li,et al.  Data-Driven Grasp Synthesis Using Shape Matching and Task-Based Pruning , 2007, IEEE Transactions on Visualization and Computer Graphics.

[44]  Yehoshua Y. Zeevi,et al.  The farthest point strategy for progressive image sampling , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 2 - Conference B: Computer Vision & Image Processing. (Cat. No.94CH3440-5).

[45]  Berthold K. P. Horn,et al.  Closed-form solution of absolute orientation using orthonormal matrices , 1988 .

[46]  Robert C. Bolles,et al.  Parametric Correspondence and Chamfer Matching: Two New Techniques for Image Matching , 1977, IJCAI.

[47]  Pieter Abbeel,et al.  Learning from Demonstrations Through the Use of Non-rigid Registration , 2013, ISRR.