AdaAfford: Learning to Adapt Manipulation Affordance for 3D Articulated Objects via Few-shot Interactions

Perceiving and interacting with 3D articulated objects, such as cabinets, doors, and faucets, pose particular challenges for future home-assistant robots performing daily tasks in human environments. Besides parsing the articulated parts and joint parameters, researchers recently advocate learning manipulation affordance over the input shape geometry which is more task-aware and geometrically finegrained. However, taking only passive observations as inputs, these methods ignore many hidden but important kinematic constraints (e.g., joint location and limits) and dynamic factors (e.g., joint friction and restitution), therefore losing significant accuracy for test cases with such uncertainties. In this paper, we propose a novel framework, named AdaAfford, that learns to perform very few testtime interactions for quickly adapting the affordance priors to more accurate instance-specific posteriors. We conduct large-scale experiments using the PartNet-Mobility dataset and prove that our system performs better than baselines. *Equal contribution †Corresponding author

[1]  Jiajun Wu,et al.  DensePhysNet: Learning Dense Physical Object Representations via Multi-step Dynamic Interactions , 2019, Robotics: Science and Systems.

[2]  Alexei A. Efros,et al.  People Watching: Human Actions as a Cue for Single View Geometry , 2012, International Journal of Computer Vision.

[3]  Danica Kragic,et al.  Visual object-action recognition: Inferring object affordances from human demonstration , 2011, Comput. Vis. Image Underst..

[4]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[5]  Nourhan Sakr,et al.  Few-Shot System Identification for Reinforcement Learning , 2021, 2021 6th Asia-Pacific Conference on Intelligent Robot Systems (ACIRS).

[6]  Greg Turk,et al.  Preparing for the Unknown: Learning a Universal Policy with Online System Identification , 2017, Robotics: Science and Systems.

[7]  Song-Chun Zhu,et al.  Understanding tools: Task-oriented object modeling, learning and recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Francesc Moreno-Noguer,et al.  GanHand: Predicting Human Grasp Affordances in Multi-Object Scenes , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Leonidas J. Guibas,et al.  Deep part induction from articulated object pairs , 2018, ACM Trans. Graph..

[10]  Cewu Lu,et al.  CPF: Learning a Contact Potential Field to Model the Hand-Object Interaction , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[11]  Silvio Savarese,et al.  KETO: Learning Keypoint Representations for Tool Manipulation , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[12]  Sergey Levine,et al.  Reasoning About Physical Interactions with Object-Oriented Prediction and Planning , 2018, ICLR.

[13]  Leonidas Guibas,et al.  VAT-Mart: Learning Visual Action Trajectory Proposals for Manipulating 3D ARTiculated Objects , 2021, ArXiv.

[14]  Shubham Tulsiani,et al.  Where2Act: From Pixels to Actions for Articulated 3D Objects , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  Leonidas Guibas,et al.  O2O-Afford: Annotation-Free Large-Scale Object-Object Affordance Learning , 2021, CoRL.

[16]  Kristen Grauman,et al.  Learning Affordance Landscapes for Interaction Exploration in 3D Environments , 2020, NeurIPS.

[17]  Irfan Essa,et al.  Estimating Mass Distribution of Articulated Objects using Non-prehensile Manipulation , 2021 .

[18]  Dimitrios Tzionas,et al.  Reconstructing Articulated Rigged Models from RGB-D Videos , 2016, ECCV Workshops.

[19]  Lars Petersson,et al.  High-level control of a mobile manipulator for door opening , 2000, Proceedings. 2000 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2000) (Cat. No.00CH37113).

[20]  Tamim Asfour,et al.  Learning Visual Dynamics Models of Rigid Objects using Relational Inductive Biases , 2019, ArXiv.

[21]  Abhinav Gupta,et al.  Environment Probing Interaction Policies , 2019, ICLR.

[22]  Shaogang Ren,et al.  Object-object interaction affordance learning , 2014, Robotics Auton. Syst..

[23]  Silvio Savarese,et al.  Demo2Vec: Reasoning Object Affordances from Online Videos , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Sergey Levine,et al.  MELD: Meta-Reinforcement Learning from Images via Latent State Models , 2020, CoRL.

[25]  三嶋 博之 The theory of affordances , 2008 .

[26]  Deva Ramanan,et al.  Learning to Move with Affordance Maps , 2020, ICLR.

[27]  Danica Kragic,et al.  Learning Task-Oriented Grasping From Human Activity Datasets , 2019, IEEE Robotics and Automation Letters.

[28]  Leonidas J. Guibas,et al.  SAPIEN: A SimulAted Part-Based Interactive ENvironment , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Maxim Likhachev,et al.  Planning for autonomous door opening with a mobile manipulator , 2010, 2010 IEEE International Conference on Robotics and Automation.

[30]  Joseph Redmon,et al.  Real-time grasp detection using convolutional neural networks , 2014, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[31]  Leonidas J. Guibas,et al.  CAPTRA: CAtegory-level Pose Tracking for Rigid and Articulated Objects from Point Clouds , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[32]  Hao Su,et al.  S4G: Amodal Single-view Single-Shot SE(3) Grasp Detection in Cluttered Scenes , 2019, CoRL.

[33]  Jan Kautz,et al.  Putting Humans in a Scene: Learning Affordance in 3D Indoor Environments , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Hui Huang,et al.  RPM-Net , 2019, ACM Trans. Graph..

[35]  Honglak Lee,et al.  Deep learning for detecting robotic grasps , 2013, Int. J. Robotics Res..

[36]  Jitendra Malik,et al.  Learning to Poke by Poking: Experiential Learning of Intuitive Physics , 2016, NIPS.

[37]  Kristen Grauman,et al.  Learning Dexterous Grasping with Object-Centric Visual Affordances , 2020, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[38]  Dieter Fox,et al.  DART: Dense Articulated Real-Time Tracking , 2014, Robotics: Science and Systems.

[39]  Xiaogang Wang,et al.  Shape2Motion: Joint Analysis of Motion Parts and Attributes From 3D Shapes , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Pieter Abbeel,et al.  DoorGym: A Scalable Door Opening Environment And Baseline Agent , 2019, ArXiv.

[41]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[42]  Sergey Levine,et al.  Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables , 2019, ICML.

[43]  Leonidas J. Guibas,et al.  PartNet: A Large-Scale Benchmark for Fine-Grained and Hierarchical Part-Level 3D Object Understanding , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).