论文信息 - GIFT: Generalizable Interaction-aware Functional Tool Affordances without Labels

GIFT: Generalizable Interaction-aware Functional Tool Affordances without Labels

Tool use requires reasoning about the fit between an object’s affordances and the demands of a task. Visual affordance learning can benefit from goal-directed interaction experience, but current techniques rely on human labels or expert demonstrations to generate this data. In this paper, we describe a method that grounds affordances in physical interactions instead, thus removing the need for human labels or expert policies. We use an efficient sampling-based method to generate successful trajectories that provide contact data, which are then used to reveal affordance representations. Our framework, GIFT, operates in two phases: first, we discover visual affordances from goal-directed interaction with a set of procedurally generated tools; second, we train a model to predict new instances of the discovered affordances on novel tools in a self-supervised fashion. In our experiments, we show that GIFT can leverage a sparse keypoint representation to predict grasp and interaction points to accommodate multiple tasks, such as hooking, reaching, and hammering. GIFT outperforms baselines on all tasks and matches a human oracle on two of three tasks using novel tools. Qualitative results available at: www.pair.toronto.edu/gift-tools-rss21.

[1] Silvio Savarese,et al. 6-PACK: Category-level 6D Pose Tracker with Anchor-Based Keypoints , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[2] Leonidas J. Guibas,et al. The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[3] Danica Kragic,et al. Data-Driven Grasp Synthesis—A Survey , 2013, IEEE Transactions on Robotics.

[4] Wei Gao,et al. kPAM: KeyPoint Affordances for Category-Level Robotic Manipulation , 2019, ISRR.

[5] Giorgio Metta,et al. Self-supervised learning of tool affordances from 3D tool representation through parallel SOM mapping , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[6] Leonidas J. Guibas,et al. Earth mover's distances on discrete surfaces , 2014, ACM Trans. Graph..

[7] Jaewoo Kang,et al. Self-Attention Graph Pooling , 2019, ICML.

[8] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Ken Goldberg,et al. On-Policy Dataset Synthesis for Learning Robot Grasping Policies Using Fully Convolutional Deep Networks , 2019, IEEE Robotics and Automation Letters.

[10] Marco Cuturi,et al. Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[11] J. Gibson. The Ecological Approach to Visual Perception , 1979 .

[12] Silvio Savarese,et al. Learning task-oriented grasping for tool manipulation from simulated self-supervision , 2018, Robotics: Science and Systems.

[13] Aravind Rajeswaran,et al. Lyceum: An efficient and scalable ecosystem for robot learning , 2020, L4DC.

[14] Jonathan Tompson,et al. Discovery of Latent 3D Keypoints via End-to-end Geometric Reasoning , 2018, NeurIPS.

[15] 三嶋博之. The theory of affordances , 2008 .

[16] Ankush Gupta,et al. Unsupervised Learning of Object Landmarks through Conditional Image Generation , 2018, NeurIPS.

[17] Andrea L. Thomaz,et al. Learning Labeled Robot Affordance Models Using Simulations and Crowdsourcing , 2020, Robotics: Science and Systems.

[18] Bernard Ghanem,et al. DeepGCNs: Can GCNs Go As Deep As CNNs? , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[20] Danica Kragic,et al. Learning task constraints for robot grasping using graphical models , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[21] Darwin G. Caldwell,et al. AffordanceNet: An End-to-End Deep Learning Approach for Object Affordance Detection , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[22] Xinyu Liu,et al. Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics , 2017, Robotics: Science and Systems.

[23] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[24] Peter K. Allen,et al. Semantic grasping: Planning robotic grasps functionally suitable for an object manipulation task , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[25] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[26] Nolan Wagener,et al. Information theoretic MPC for model-based reinforcement learning , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[27] Yiannis Aloimonos,et al. Affordance detection of tool parts from geometric features , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[28] Shubham Tulsiani,et al. Where2Act: From Pixels to Actions for Articulated 3D Objects , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[29] Leonidas J. Guibas,et al. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Ankush Gupta,et al. Unsupervised Learning of Object Keypoints for Perception and Control , 2019, NeurIPS.

[31] Silvio Savarese,et al. KETO: Learning Keypoint Representations for Tool Manipulation , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[32] Nitin Agarwal,et al. Learning Embedding of 3D models with Quadric Loss , 2019, BMVC.