HMS-Net: Hierarchical Multi-Scale Sparsity-Invariant Network for Sparse Depth Completion

Dense depth cues are important and have wide applications in various computer vision tasks. In autonomous driving, LIDAR sensors are adopted to acquire depth measurements around the vehicle to perceive the surrounding environments. However, depth maps obtained by LIDAR are generally sparse because of its hardware limitation. The task of depth completion attracts increasing attention, which aims at generating a dense depth map from an input sparse depth map. To effectively utilize multi-scale features, we propose three novel sparsity-invariant operations, based on which, a sparsity-invariant multi-scale encoder-decoder network (HMS-Net) for handling sparse inputs and sparse feature maps is also proposed. Additional RGB features could be incorporated to further improve the depth completion performance. Our extensive experiments and component analysis on two public benchmarks, KITTI depth completion benchmark and NYU-depth-v2 dataset, demonstrate the effectiveness of the proposed approach. As of Aug. 12th, 2018, on KITTI depth completion leaderboard, our proposed model without RGB guidance ranks 1st among all peer-reviewed methods without using RGB information, and our model with RGB guidance ranks 2nd among all RGB-guided methods.

[1]  Stefano Soatto,et al.  Dense Depth Posterior (DDP) From Single Image and Sparse Range , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Marc Pollefeys,et al.  Semantically Guided Depth Upsampling , 2016, GCPR.

[3]  Luc Van Gool,et al.  Sparse and Noisy LiDAR Completion with RGB Guidance and Uncertainty , 2019, 2019 16th International Conference on Machine Vision Applications (MVA).

[4]  E. Nadaraya On Estimating Regression , 1964 .

[5]  Hui Lin,et al.  Depth image enhancement for Kinect using region growing and bilateral filter , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[6]  Sertac Karaman,et al.  Sparse-to-Dense: Depth Prediction from Sparse Depth Samples and a Single Image , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[7]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[8]  Luis Miguel Bergasa,et al.  Efficient ConvNet for real-time semantic segmentation , 2017, 2017 IEEE Intelligent Vehicles Symposium (IV).

[9]  Fawzi Nashashibi,et al.  Sparse and Dense Data with CNNs: Depth Completion and Semantic Segmentation , 2018, 2018 International Conference on 3D Vision (3DV).

[10]  Horst Bischof,et al.  Variational Depth Superresolution Using Example-Based Edge Representations , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  Zhanyi Hu,et al.  Learning Depth From Single Images With Deep Neural Network Embedding Focal Length , 2018, IEEE Transactions on Image Processing.

[12]  Bastian Leibe,et al.  Full-Resolution Residual Networks for Semantic Segmentation in Street Scenes , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[14]  Ruigang Yang,et al.  Depth Estimation via Affinity Learned with Convolutional Spatial Propagation Network , 2018, ECCV.

[15]  Dmitriy Vatolin,et al.  Temporal filtering for depth maps generated by Kinect depth camera , 2011, 2011 3DTV Conference: The True Vision - Capture, Transmission and Display of 3D Video (3DTV-CON).

[16]  Jonathan T. Barron,et al.  The Fast Bilateral Solver , 2015, ECCV.

[17]  Gabriel J. Brostow,et al.  Patch Based Synthesis for Single Depth Image Super-Resolution , 2012, ECCV.

[18]  Yong Liu,et al.  Parse geometry from a line: Monocular depth estimation with partial laser observation , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[19]  Narendra Ahuja,et al.  Deep Joint Image Filtering , 2016, ECCV.

[20]  Steven Lake Waslander,et al.  In Defense of Classical Image Processing: Fast Depth Completion on the CPU , 2018, 2018 15th Conference on Computer and Robot Vision (CRV).

[21]  Xiaoou Tang,et al.  Depth Map Super-Resolution by Deep Multi-Scale Guidance , 2016, ECCV.

[22]  Rogério Schmidt Feris,et al.  Joint Super Resolution and Denoising From a Single Depth Image , 2015, IEEE Transactions on Multimedia.

[23]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Yao Wang,et al.  Color-Guided Depth Recovery From RGB-D Data Using an Adaptive Autoregressive Model , 2014, IEEE Transactions on Image Processing.

[25]  Yinda Zhang,et al.  Deep Depth Completion of a Single RGB-D Image , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  Simon Lucey,et al.  Deep Convolutional Compressed Sensing for LiDAR Depth Completion , 2018, ACCV.

[27]  Nassir Navab,et al.  Deeper Depth Prediction with Fully Convolutional Residual Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[28]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[29]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Michael Felsberg,et al.  Confidence Propagation through CNNs for Guided Sparse Depth Regression , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  M. Pollefeys,et al.  DeepLiDAR: Deep Surface Normal Guided Depth Prediction for Outdoor Scene From Sparse LiDAR Data and Single Color Image , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Sertac Karaman,et al.  Self-Supervised Sparse-to-Dense: Self-Supervised Depth Completion from LiDAR and Monocular Camera , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[33]  Thomas Brox,et al.  Sparsity Invariant CNNs , 2017, 2017 International Conference on 3D Vision (3DV).

[34]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[35]  Huazhu Fu,et al.  Hierarchical Features Driven Residual Learning for Depth Map Super-Resolution , 2019, IEEE Transactions on Image Processing.

[36]  Michael Felsberg,et al.  Propagating Confidences through CNNs for Sparse Data Regression , 2018, BMVC.

[37]  Pascal Frossard,et al.  Global Auto-Regressive Depth Recovery via Iterative Non-Local Filtering , 2019, IEEE Transactions on Broadcasting.

[38]  Horst Bischof,et al.  ATGV-Net: Accurate Depth Super-Resolution , 2016, ECCV.

[39]  Rogério Schmidt Feris,et al.  Edge guided single depth image super resolution , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[40]  Luis Salgado,et al.  Efficient spatio-temporal hole filling strategy for Kinect depth maps , 2012, Electronic Imaging.

[41]  Stefano Soatto,et al.  Conditional Prior Networks for Optical Flow , 2018, ECCV.

[42]  Horst Bischof,et al.  Image Guided Depth Upsampling Using Anisotropic Total Generalized Variation , 2013, 2013 IEEE International Conference on Computer Vision.

[43]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[45]  Bin Yang,et al.  SBNet: Sparse Blocks Network for Fast Inference , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[46]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.