Sparse Depth Completion with Semantic Mesh Deformation Optimization

Sparse depth measurements are widely available in many applications such as augmented reality, visual inertial odometry and robots equipped with low cost depth sensors. Although such sparse depth samples work well for certain applications like motion tracking, a complete depth map is usually preferred for broader applications, such as 3D object recognition, 3D reconstruction and autonomous driving. Despite the recent advancements in depth prediction from single RGB images with deeper neural networks, the existing approaches do not yield reliable results for practical use. In this work, we propose a neural network with post-optimization, which takes an RGB image and sparse depth samples as input and predicts the complete depth map. We make three major contributions to advance the state-of-the-art: an improved backbone network architecture named EDNet, a semantic edge-weighted loss function and a semantic mesh deformation optimization method. Our evaluation results outperform the existing work consistently on both indoor and outdoor datasets, and it significantly reduces the mean average error by up to 19.5% under the same settings of 200 sparse samples on NYU-Depth-

[1]  Hujun Bao,et al.  Depth Completion From Sparse LiDAR Data With Depth-Normal Constraints , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[2]  Ce Liu,et al.  Depth Extraction from Video Using Non-parametric Sampling , 2012, ECCV.

[3]  Marc Alexa,et al.  As-rigid-as-possible surface modeling , 2007, Symposium on Geometry Processing.

[4]  David Kim,et al.  DepthLab: Real-time 3D Interaction with Depth Maps for Mobile Augmented Reality , 2020, UIST.

[5]  D. T. Lee,et al.  Two algorithms for constructing a Delaunay triangulation , 1980, International Journal of Computer & Information Sciences.

[6]  Rob Fergus,et al.  Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[7]  Bolei Zhou,et al.  Semantic Understanding of Scenes Through the ADE20K Dataset , 2016, International Journal of Computer Vision.

[8]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Michael M. Kazhdan,et al.  Poisson surface reconstruction , 2006, SGP '06.

[10]  Sertac Karaman,et al.  Sparse-to-Dense: Depth Prediction from Sparse Depth Samples and a Single Image , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[11]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[12]  Suya You,et al.  Deep RGB-D Canonical Correlation Analysis For Sparse Depth Completion , 2019, NeurIPS.

[13]  Junsong Yuan,et al.  Depth camera based hand gesture recognition and its applications in Human-Computer-Interaction , 2011, 2011 8th International Conference on Information, Communications & Signal Processing.

[14]  Meng Wang,et al.  2D-to-3D image conversion by learning depth from examples , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[15]  Nassir Navab,et al.  Deeper Depth Prediction with Fully Convolutional Residual Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[16]  Yan Wang,et al.  Pseudo-LiDAR From Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Feifei Gu,et al.  A Joint Calibration Method for the 3D Sensing System Composed with ToF and Stereo Camera , 2018, 2018 IEEE International Conference on Information and Automation (ICIA).

[18]  J. Hecht Lidar for Self-Driving Cars , 2018 .

[19]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[20]  Kyungdon Joo,et al.  Non-Local Spatial Propagation Network for Depth Completion , 2020, ECCV.

[21]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Andrew W. Fitzgibbon,et al.  KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera , 2011, UIST.

[23]  Ruigang Yang,et al.  Learning Depth with Convolutional Spatial Propagation Network , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Sinisa Todorovic,et al.  Monocular Depth Estimation Using Neural Regression Forest , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Shinpei Kato,et al.  Non-Guided Depth Completion with Adversarial Networks , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[26]  Yong Liu,et al.  Parse geometry from a line: Monocular depth estimation with partial laser observation , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[27]  Jana Kosecka,et al.  Joint Semantic Segmentation and Depth Estimation with Deep Convolutional Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[28]  Juan D. Tardós,et al.  ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras , 2016, IEEE Transactions on Robotics.

[29]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.