论文信息 - Self-Supervised Sparse-to-Dense: Self-Supervised Depth Completion from LiDAR and Monocular Camera

Self-Supervised Sparse-to-Dense: Self-Supervised Depth Completion from LiDAR and Monocular Camera

Depth completion, the technique of estimating a dense depth image from sparse depth measurements, has a variety of applications in robotics and autonomous driving. However, depth completion faces 3 main challenges: the irregularly spaced pattern in the sparse depth input, the difficulty in handling multiple sensor modalities (when color images are available), as well as the lack of dense, pixel-level ground truth depth labels for training. In this work, we address all these challenges. Specifically, we develop a deep regression model to learn a direct mapping from sparse depth (and color images) input to dense depth prediction. We also propose a self-supervised training framework that requires only sequences of color and sparse depth images, without the need for dense depth labels. Our experiments demonstrate that the self-supervised framework outperforms a number of existing solutions trained with semi-dense annotations. Furthermore, when trained with semi-dense annotations, our network attains state-of-the-art accuracy and is the winning approach on the KITTI depth completion benchmark1 at the time of submission.1http://www.cvlibs.net/datasets/kitti/eval_depth.php?benchmark=depth_completion

[1] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[2] Derek Hoiem,et al. Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[3] Noah Snavely,et al. Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Jonathan T. Barron,et al. The Fast Bilateral Solver , 2015, ECCV.

[5] Feng Liu,et al. Depth Enhancement via Low-Rank Matrix Completion , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[6] Andrew W. Fitzgibbon,et al. KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[7] Zhichao Yin,et al. GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8] Robert C. Bolles,et al. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[9] Anelia Angelova,et al. Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10] Ji Zhang,et al. LOAM: Lidar Odometry and Mapping in Real-time , 2014, Robotics: Science and Systems.

[11] Luca Carlone,et al. Sparse depth sensing for resource-constrained robots , 2017, Int. J. Robotics Res..

[12] David A. Forsyth,et al. Sparse depth super resolution , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Luca Carlone,et al. Sparse sensing for resource-constrained depth reconstruction , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[14] Yinda Zhang,et al. Deep Depth Completion of a Single RGB-D Image , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15] Dongbing Gu,et al. UnDeepVO: Monocular Visual Odometry Through Unsupervised Deep Learning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[16] Simon Lucey,et al. Deep Convolutional Compressed Sensing for LiDAR Depth Completion , 2018, ACCV.

[17] Ruigang Yang,et al. Depth Estimation via Affinity Learned with Convolutional Spatial Propagation Network , 2018, ECCV.

[18] Thomas Brox,et al. DeMoN: Depth and Motion Network for Learning Monocular Stereo , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19] Ryan M. Eustice,et al. Fast LIDAR localization using multiresolution Gaussian mixture maps , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[20] Thomas Brox,et al. Sparsity Invariant CNNs , 2017, 2017 International Conference on 3D Vision (3DV).

[21] Carsten Rother,et al. Depth Super Resolution by Rigid Body Self-Similarity in 3D , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[22] Nassir Navab,et al. Deeper Depth Prediction with Fully Convolutional Residual Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[23] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .

[24] Rob Fergus,et al. Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[25] Dacheng Tao,et al. Deep Ordinal Regression Network for Monocular Depth Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26] Marc Pollefeys,et al. Semantically Guided Depth Upsampling , 2016, GCPR.

[27] Rogério Schmidt Feris,et al. Single depth image super resolution and denoising via coupled dictionary learning with local constraints and shock filtering , 2014, 2014 IEEE International Conference on Multimedia and Expo (ICME).

[28] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29] V. Lepetit,et al. EPnP: An Accurate O(n) Solution to the PnP Problem , 2009, International Journal of Computer Vision.

[30] Sertac Karaman,et al. Sparse-to-Dense: Depth Prediction from Sparse Depth Samples and a Single Image , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[31] Peter V. Gehler,et al. Learning Sparse High Dimensional Filters: Image Filtering, Dense CRFs and Bilateral Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Michael Felsberg,et al. Propagating Confidences through CNNs for Sparse Data Regression , 2018, BMVC.

[33] Steven Lake Waslander,et al. In Defense of Classical Image Processing: Fast Depth Completion on the CPU , 2018, 2018 15th Conference on Computer and Robot Vision (CRV).

[34] Sebastian Thrun,et al. An Application of Markov Random Fields to Range Sensing , 2005, NIPS.

[35] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[36] Ashutosh Saxena,et al. Learning Depth from Single Monocular Images , 2005, NIPS.

[37] Rogério Schmidt Feris,et al. Edge guided single depth image super resolution , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[38] Luis Salgado,et al. Efficient spatio-temporal hole filling strategy for Kinect depth maps , 2012, Electronic Imaging.

[39] Ju Shen,et al. Layer Depth Denoising and Completion for Structured-Light RGB-D Cameras , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.