Seeing Glass: Joint Point Cloud and Depth Completion for Transparent Objects

The basis of many object manipulation algorithms is RGB-D input. Yet, commodity RGB-D sensors can only provide distorted depth maps for a wide range of transparent objects due light refraction and absorption. To tackle the perception challenges posed by transparent objects, we propose TranspareNet, a joint point cloud and depth completion method, with the ability to complete the depth of transparent objects in cluttered and complex scenes, even with partially filled fluid contents within the vessels. To address the shortcomings of existing transparent object data collection schemes in literature, we also propose an automated dataset creation workflow that consists of robot-controlled image collection and vision-based automatic annotation. Through this automated workflow, we created Toronto Transparent Objects Depth Dataset (TODD), which consists of nearly 15000 RGB-D images. Our experimental evaluation demonstrates that TranspareNet outperforms existing state-of-the-art depth completion methods on multiple datasets, including ClearGrasp, and that it also handles cluttered scenes when trained on TODD. Code and dataset will be released at https://www.pair.toronto.edu/TranspareNet/

[1]  Takayuki Okatani,et al.  Revisiting Single Image Depth Estimation: Toward Higher Resolution Maps With Accurate Object Boundaries , 2018, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[2]  Computer Vision for Recognition of Materials and Vessels in Chemistry Lab Settings and the Vector-LabPics Data Set , 2020, ACS central science.

[3]  Nicu Sebe,et al.  Multi-scale Continuous CRFs as Sequential Deep Networks for Monocular Depth Estimation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[5]  Ran Cheng,et al.  S3CNet: A Sparse Semantic Scene Completion Network for LiDAR Point Clouds , 2020, CoRL.

[6]  Anton Konushin,et al.  Decoder Modulation for Indoor Depth Completion , 2020, ArXiv.

[7]  Nicu Sebe,et al.  Structured Attention Guided Convolutional Neural Fields for Monocular Depth Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Peter Wonka,et al.  High Quality Monocular Depth Estimation via Transfer Learning , 2018, ArXiv.

[9]  Minglun Gong,et al.  3D Reconstruction of Transparent Objects with Position-Normal Consistency , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Kostas Daniilidis,et al.  Seeing Glassware: from Edge Detection to Pose Estimation and Shape Recovery , 2016, Robotics: Science and Systems.

[11]  Chunhua Shen,et al.  Segmenting Transparent Objects in the Wild , 2020, ECCV.

[12]  Reiner Sebastian Sprick,et al.  A mobile robotic chemist , 2020, Nature.

[13]  P. Luo,et al.  Segmenting Transparent Object in the Wild with Transformer , 2021 .

[14]  Nassir Navab,et al.  Deeper Depth Prediction with Fully Convolutional Residual Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[15]  Odest Chadwicke Jenkins,et al.  LIT: Light-Field Inference of Transparency for Refractive Object Localization , 2020, IEEE Robotics and Automation Letters.

[16]  Seongjong Song,et al.  Depth Reconstruction of Translucent Objects from a Single Time-of-Flight Camera using Deep Residual Networks , 2018, ACCV.

[17]  Xinyu Liu,et al.  Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics , 2017, Robotics: Science and Systems.

[18]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Shangchen Zhou,et al.  GRNet: Gridding Residual Network for Dense Point Cloud Completion , 2020, ECCV.

[20]  Il Hong Suh,et al.  From Big to Small: Multi-Scale Local Planar Guidance for Monocular Depth Estimation , 2019, ArXiv.

[21]  Jie Tang,et al.  Learning Guided Convolutional Network for Depth Completion , 2019, IEEE Transactions on Image Processing.

[22]  Gustavo Carneiro,et al.  Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue , 2016, ECCV.

[23]  Jiri Matas,et al.  Guiding Monocular Depth Estimation Using Depth-Attention Volume , 2020, ECCV.

[24]  Seung-Won Jung,et al.  Depth completion for kinect v2 sensor , 2016, Multimedia Tools and Applications.

[25]  Connor Schenck,et al.  Perceiving and reasoning about liquids using fully convolutional networks , 2017, Int. J. Robotics Res..

[26]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[27]  Dacheng Tao,et al.  Deep Ordinal Regression Network for Monocular Depth Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Michael Beetz,et al.  Transparent object detection and reconstruction on a mobile platform , 2011, 2011 IEEE International Conference on Robotics and Automation.

[29]  Nanning Zheng,et al.  REGNet: REgion-based Grasp Network for End-to-end Grasp Detection in Point Clouds , 2020, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[30]  Kai Han,et al.  TOM-Net: Learning Transparent Object Matting from a Single Image , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Hao Su,et al.  S4G: Amodal Single-view Single-Shot SE(3) Grasp Detection in Cluttered Scenes , 2019, CoRL.

[32]  Ruigang Yang,et al.  CSPN++: Learning Context and Resource Aware Convolutional Spatial Propagation Networks for Depth Completion , 2019, AAAI.

[33]  Silvio Savarese,et al.  DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Konrad Schindler,et al.  Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-Shot Cross-Dataset Transfer , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Stephen Marsland,et al.  Seeing the Unseen: Simple Reconstruction of Transparent Objects from Point Cloud Data , 2013 .

[36]  Shaodi You,et al.  Detail Preserving Depth Estimation from a Single Image Using Attention Guided Networks , 2018, 2018 International Conference on 3D Vision (3DV).

[37]  Honglak Lee,et al.  Data-Efficient Learning for Sim-to-Real Robotic Grasping using Deep Point Cloud Prediction Networks , 2019, ArXiv.

[38]  Nuno Pereira,et al.  MaskedFusion: Mask-based 6D Object Pose Detection , 2019, ArXiv.

[39]  Mohit Sharma,et al.  A Modular Robotic Arm Control Stack for Research: Franka-Interface and FrankaPy , 2020, ArXiv.

[40]  Anelia Angelova,et al.  KeyPose: Multi-View 3D Labeling and Keypoint Estimation for Transparent Objects , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Nicolas Usunier,et al.  End-to-End Object Detection with Transformers , 2020, ECCV.

[42]  Edwin Olson,et al.  AprilTag 2: Efficient and robust fiducial detection , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[43]  Taesung Park,et al.  Semantic Image Synthesis With Spatially-Adaptive Normalization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Kate Saenko,et al.  Grasp Pose Detection in Point Clouds , 2017, Int. J. Robotics Res..

[45]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[46]  Vincent Rabaud,et al.  Pose estimation of rigid transparent objects in transparent clutter , 2013, 2013 IEEE International Conference on Robotics and Automation.

[47]  Weifeng Chen,et al.  Single-Image Depth Perception in the Wild , 2016, NIPS.

[48]  Chi Xu,et al.  6DoF Pose Estimation of Transparent Object from a Single RGB-D Image , 2020, Sensors.

[49]  Kyungdon Joo,et al.  Non-Local Spatial Propagation Network for Depth Completion , 2020, ECCV.

[50]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[51]  Kai Han,et al.  A fixed viewpoint approach for dense reconstruction of transparent objects , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Yijun Ji,et al.  Fusing Depth and Silhouette for Scanning Transparent Object with RGB-D Sensor , 2017 .

[53]  Shuran Song,et al.  Clear Grasp: 3D Shape Estimation of Transparent Objects for Manipulation , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[54]  Hammad Mazhar,et al.  RGB-D Local Implicit Function for Depth Completion of Transparent Objects , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Dieter Fox,et al.  6-DOF GraspNet: Variational Grasp Generation for Object Manipulation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[56]  George Papandreou,et al.  Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.