论文信息 - LIDAR and Monocular Camera Fusion: On-road Depth Completion for Autonomous Driving

LIDAR and Monocular Camera Fusion: On-road Depth Completion for Autonomous Driving

LIDAR and RGB cameras are commonly used sensors in autonomous vehicles. However, both of them have limitations: LIDAR provides accurate depth but is sparse in vertical and horizontal resolution; RGB images provide dense texture but lack depth information. In this paper, we fuse LIDAR and RGB images by a deep neural network, which completes a denser pixel-wise depth map. The proposed architecture reconstructs the pixel-wise depth map, taking advantage of both the dense color features and sparse 3D spatial features. We applied the early fusion technique and fine-tuned the ResNet model as the encoder. The designed Residual Up-Projection block recovers the spatial resolution of the feature map and captures context within the depth map. We introduced a depth feature tensor which propagates context information from encoder blocks to decoder blocks. Our proposed method is evaluated on the large-scale indoor NYUdepthV2 and KITTI odometry datasets and outperforms the state-of-the-art single RGB image and depth fusion method. The proposed method is also evaluated on a reduced-resolution KITTI dataset which synthesizes the planar LIDAR and RGB image fusion.

[1] Dacheng Tao,et al. Deep Ordinal Regression Network for Monocular Depth Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2] Sertac Karaman,et al. Sparse-to-Dense: Depth Prediction from Sparse Depth Samples and a Single Image , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[3] David A. Forsyth,et al. Sparse depth super resolution , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Yoshimitsu Aoki,et al. Depth image enhancement using local tangent plane approximations , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Nassir Navab,et al. Deeper Depth Prediction with Fully Convolutional Residual Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[6] Yinda Zhang,et al. Deep Depth Completion of a Single RGB-D Image , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7] Yong Liu,et al. Parse geometry from a line: Monocular depth estimation with partial laser observation , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[8] Stephen Gould,et al. Single image depth estimation from predicted semantic labels , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9] Rob Fergus,et al. Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[10] Yoshua Bengio,et al. Generative Adversarial Networks , 2014, ArXiv.

[11] Rob Fergus,et al. Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[12] Graham W. Taylor,et al. Adaptive deconvolutional networks for mid and high level feature learning , 2011, 2011 International Conference on Computer Vision.

[13] Derek Hoiem,et al. Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[14] Paolo Valigi,et al. Fast robust monocular depth estimation for Obstacle Detection with fully convolutional networks , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[15] Sertac Karaman,et al. Self-Supervised Sparse-to-Dense: Self-Supervised Depth Completion from LiDAR and Monocular Camera , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[16] Thomas Brox,et al. Sparsity Invariant CNNs , 2017, 2017 International Conference on 3D Vision (3DV).

[17] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Chen Fu,et al. Camera-Based Semantic Enhanced Vehicle Segmentation for Planar LIDAR , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[19] Dongdong Zhang,et al. A spatio-temporal inpainting method for Kinect depth video , 2013, 2013 IEEE International Conference on Signal and Image Processing Applications.

[20] Minh N. Do,et al. A revisit to MRF-based depth map super-resolution and enhancement , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21] Fawzi Nashashibi,et al. Sparse and Dense Data with CNNs: Depth Completion and Semantic Segmentation , 2018, 2018 International Conference on 3D Vision (3DV).

[22] Ashutosh Saxena,et al. Learning Depth from Single Monocular Images , 2005, NIPS.

[23] Luis Salgado,et al. Efficient spatio-temporal hole filling strategy for Kinect depth maps , 2012, Electronic Imaging.