GIFT: Generalizable Interaction-aware Functional Tool Affordances without Labels

Tool use requires reasoning about the fit between an object’s affordances and the demands of a task. Visual affordance learning can benefit from goal-directed interaction experience, but current techniques rely on human labels or expert demonstrations to generate this data. In this paper, we describe a method that grounds affordances in physical interactions instead, thus removing the need for human labels or expert policies. We use an efficient sampling-based method to generate successful trajectories that provide contact data, which are then used to reveal affordance representations. Our framework, GIFT, operates in two phases: first, we discover visual affordances from goal-directed interaction with a set of procedurally generated tools; second, we train a model to predict new instances of the discovered affordances on novel tools in a self-supervised fashion. In our experiments, we show that GIFT can leverage a sparse keypoint representation to predict grasp and interaction points to accommodate multiple tasks, such as hooking, reaching, and hammering. GIFT outperforms baselines on all tasks and matches a human oracle on two of three tasks using novel tools. Qualitative results available at: www.pair.toronto.edu/gift-tools-rss21.

[1]  Silvio Savarese,et al.  6-PACK: Category-level 6D Pose Tracker with Anchor-Based Keypoints , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[2]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[3]  Danica Kragic,et al.  Data-Driven Grasp Synthesis—A Survey , 2013, IEEE Transactions on Robotics.

[4]  Wei Gao,et al.  kPAM: KeyPoint Affordances for Category-Level Robotic Manipulation , 2019, ISRR.

[5]  Giorgio Metta,et al.  Self-supervised learning of tool affordances from 3D tool representation through parallel SOM mapping , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[6]  Leonidas J. Guibas,et al.  Earth mover's distances on discrete surfaces , 2014, ACM Trans. Graph..

[7]  Jaewoo Kang,et al.  Self-Attention Graph Pooling , 2019, ICML.

[8]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Ken Goldberg,et al.  On-Policy Dataset Synthesis for Learning Robot Grasping Policies Using Fully Convolutional Deep Networks , 2019, IEEE Robotics and Automation Letters.

[10]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[11]  J. Gibson The Ecological Approach to Visual Perception , 1979 .

[12]  Silvio Savarese,et al.  Learning task-oriented grasping for tool manipulation from simulated self-supervision , 2018, Robotics: Science and Systems.

[13]  Aravind Rajeswaran,et al.  Lyceum: An efficient and scalable ecosystem for robot learning , 2020, L4DC.

[14]  Jonathan Tompson,et al.  Discovery of Latent 3D Keypoints via End-to-end Geometric Reasoning , 2018, NeurIPS.

[15]  三嶋 博之 The theory of affordances , 2008 .

[16]  Ankush Gupta,et al.  Unsupervised Learning of Object Landmarks through Conditional Image Generation , 2018, NeurIPS.

[17]  Andrea L. Thomaz,et al.  Learning Labeled Robot Affordance Models Using Simulations and Crowdsourcing , 2020, Robotics: Science and Systems.

[18]  Bernard Ghanem,et al.  DeepGCNs: Can GCNs Go As Deep As CNNs? , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[20]  Danica Kragic,et al.  Learning task constraints for robot grasping using graphical models , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[21]  Darwin G. Caldwell,et al.  AffordanceNet: An End-to-End Deep Learning Approach for Object Affordance Detection , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[22]  Xinyu Liu,et al.  Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics , 2017, Robotics: Science and Systems.

[23]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[24]  Peter K. Allen,et al.  Semantic grasping: Planning robotic grasps functionally suitable for an object manipulation task , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[25]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[26]  Nolan Wagener,et al.  Information theoretic MPC for model-based reinforcement learning , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[27]  Yiannis Aloimonos,et al.  Affordance detection of tool parts from geometric features , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[28]  Shubham Tulsiani,et al.  Where2Act: From Pixels to Actions for Articulated 3D Objects , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[29]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Ankush Gupta,et al.  Unsupervised Learning of Object Keypoints for Perception and Control , 2019, NeurIPS.

[31]  Silvio Savarese,et al.  KETO: Learning Keypoint Representations for Tool Manipulation , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[32]  Nitin Agarwal,et al.  Learning Embedding of 3D models with Quadric Loss , 2019, BMVC.