Learning Neural Implicit Functions as Object Representations for Robotic Manipulation

Robotic manipulation planning is the problem of finding a sequence of robot configurations that involves interactions with objects in the scene, e.g., grasp, placement, tool-use, etc. To achieve such interactions, traditional approaches require hand-designed features and object representations, and it still remains an open question how to describe such interactions with arbitrary objects in a flexible and efficient way. Inspired by recent advances in 3D modeling, e.g. NeRF, we propose a method to represent objects as neural implicit functions upon which we can define and jointly train interaction constraint functions. The proposed pixelaligned representation is directly inferred from camera images with known camera geometry, naturally acting as a perception component in the whole manipulation pipeline, while at the same time enabling sequential robot manipulation planning.

[1]  Xinyu Liu,et al.  Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics , 2017, Robotics: Science and Systems.

[2]  Alex Trevithick,et al.  GRF: Learning a General Radiance Field for 3D Scene Representation and Rendering , 2020, ArXiv.

[3]  Hao Li,et al.  PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[4]  Kate Saenko,et al.  Grasp Pose Detection in Point Clouds , 2017, Int. J. Robotics Res..

[5]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Tucker Hermans,et al.  Learning Continuous 3D Reconstructions for Geometrically Aware Grasping , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[7]  Dieter Fox,et al.  6-DOF GraspNet: Variational Grasp Generation for Object Manipulation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  Gordon Wetzstein,et al.  Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations , 2019, NeurIPS.

[9]  Russ Tedrake,et al.  Keypoints into the Future: Self-Supervised Correspondence in Model-Based Reinforcement Learning , 2020, CoRL.

[10]  Wei Gao,et al.  kPAM 2.0: Feedback Control for Category-Level Robotic Manipulation , 2021, IEEE Robotics and Automation Letters.

[11]  Eric Lengyel Volumetric Hierarchical Approximate Convex Decomposition , 2016 .

[12]  Sven J. Dickinson,et al.  GIFT: Generalizable Interaction-aware Functional Tool Affordances without Labels , 2021, Robotics: Science and Systems.

[13]  Pratul P. Srinivasan,et al.  NeRF , 2020, ECCV.

[14]  Dieter Fox,et al.  ACRONYM: A Large-Scale Grasp Dataset Based on Simulation , 2020, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[15]  Andreas Geiger,et al.  Differentiable Volumetric Rendering: Learning Implicit 3D Representations Without 3D Supervision , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Richard M. Murray,et al.  A Mathematical Introduction to Robotic Manipulation , 1994 .

[17]  Yuke Zhu,et al.  Synergies Between Affordance and Geometry: 6-DoF Grasp Detection via Implicit Representations , 2021, Robotics: Science and Systems.

[18]  Dieter Fox,et al.  6-DOF Grasping for Target-driven Object Manipulation in Clutter , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[19]  Duygu Ceylan,et al.  DISN: Deep Implicit Surface Network for High-quality Single-view 3D Reconstruction , 2019, NeurIPS.

[20]  Yaron Lipman,et al.  SAL: Sign Agnostic Learning of Shapes From Raw Data , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Silvio Savarese,et al.  KETO: Learning Keypoint Representations for Tool Manipulation , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[22]  Ronen Basri,et al.  Multiview Neural Surface Reconstruction by Disentangling Geometry and Appearance , 2020, NeurIPS.

[23]  Dieter Fox,et al.  Contact-GraspNet: Efficient 6-DoF Grasp Generation in Cluttered Scenes , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[24]  Marc Toussaint,et al.  Differentiable Physics and Stable Modes for Tool-Use and Manipulation Planning , 2018, Robotics: Science and Systems.

[25]  Russ Tedrake,et al.  Dense Object Nets: Learning Dense Visual Object Descriptors By and For Robotic Manipulation , 2018, CoRL.

[26]  Sebastian Nowozin,et al.  Occupancy Networks: Learning 3D Reconstruction in Function Space , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[28]  Jung-Su Ha,et al.  A Probabilistic Framework for Constrained Manipulations and Task and Motion Planning under Uncertainty , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[29]  Russ Tedrake,et al.  Self-Supervised Correspondence in Visuomotor Policy Learning , 2019, IEEE Robotics and Automation Letters.

[30]  Roland Siegwart,et al.  Volumetric Grasping Network: Real-time 6 DOF Grasp Detection in Clutter , 2021, ArXiv.

[31]  Wei Gao,et al.  kPAM: KeyPoint Affordances for Category-Level Robotic Manipulation , 2019, ISRR.

[32]  Jie Song,et al.  Category Level Object Pose Estimation via Neural Analysis-by-Synthesis , 2020, ECCV.

[33]  Yaron Lipman,et al.  Implicit Geometric Regularization for Learning Shapes , 2020, ICML.

[34]  Marc Toussaint A Tutorial on Newton Methods for Constrained Trajectory Optimization and Relations to SLAM, Gaussian Process Smoothing, Optimal Control, and Probabilistic Inference , 2017, Geometric and Numerical Foundations of Movements.

[35]  Richard A. Newcombe,et al.  DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Jung-Su Ha,et al.  Describing Physics For Physical Reasoning: Force-Based Sequential Manipulation Planning , 2020, IEEE Robotics and Automation Letters.

[37]  Dieter Fox,et al.  LatentFusion: End-to-End Differentiable Reconstruction and Rendering for Unseen Object Pose Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Marc Toussaint,et al.  Learning Models as Functionals of Signed-Distance Fields for Manipulation Planning , 2021, CoRL.

[39]  Tobias Ritschel,et al.  Unsupervised Learning of 3D Object Categories from Videos in the Wild , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Patrick Labatut,et al.  Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[41]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[42]  Marc Pollefeys,et al.  Convolutional Occupancy Networks , 2020, ECCV.