Depth Completion via Inductive Fusion of Planar LIDAR and Monocular Camera

Modern high-definition LIDAR is expensive for commercial autonomous driving vehicles and small indoor robots. An affordable solution to this problem is fusion of planar LIDAR with RGB images to provide a similar level of perception capability. Even though state-of-the-art methods provide approaches to predict depth information from limited sensor input, they are usually a simple concatenation of sparse LIDAR features and dense RGB features through an end-to-end fusion architecture. In this paper, we introduce an inductive late-fusion block which better fuses different sensor modalities inspired by a probability model. The proposed demonstration and aggregation network propagates the mixed context and depth features to the prediction network and serves as a prior knowledge of the depth completion. This late-fusion block uses the dense context features to guide the depth prediction based on demonstrations by sparse depth features. In addition to evaluating the proposed method on benchmark depth completion datasets including NYUDepthV2 and KITTI, we also test the proposed method on a simulated planar LIDAR dataset. Our method shows promising results compared to previous approaches on both the benchmark datasets and simulated dataset with various 3D densities.

[1]  Chen Fu,et al.  Camera-Based Semantic Enhanced Vehicle Segmentation for Planar LIDAR , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[2]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[3]  Sinisa Todorovic,et al.  Monocular Depth Estimation Using Neural Regression Forest , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Jie Tang,et al.  Learning Guided Convolutional Network for Depth Completion , 2019, IEEE Transactions on Image Processing.

[5]  Minh N. Do,et al.  A revisit to MRF-based depth map super-resolution and enhancement , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Michael Felsberg,et al.  Propagating Confidences through CNNs for Sparse Data Regression , 2018, BMVC.

[7]  Ruigang Yang,et al.  Depth Estimation via Affinity Learned with Convolutional Spatial Propagation Network , 2018, ECCV.

[8]  David A. Forsyth,et al.  Sparse depth super resolution , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Paolo Valigi,et al.  Fast robust monocular depth estimation for Obstacle Detection with fully convolutional networks , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[10]  Sebastian Thrun,et al.  Precision tracking with sparse 3D and dense color 2D data , 2013, 2013 IEEE International Conference on Robotics and Automation.

[11]  In So Kweon,et al.  Depth Completion with Deep Geometry and Context Guidance , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[12]  Xu Zhao,et al.  Oriented Spatial Transformer Network for Pedestrian Detection Using Fish-Eye Camera , 2020, IEEE Transactions on Multimedia.

[13]  Yee Whye Teh,et al.  Conditional Neural Processes , 2018, ICML.

[14]  Sertac Karaman,et al.  Sparse-to-Dense: Depth Prediction from Sparse Depth Samples and a Single Image , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[15]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[16]  M. Pollefeys,et al.  DeepLiDAR: Deep Surface Normal Guided Depth Prediction for Outdoor Scene From Sparse LiDAR Data and Single Color Image , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Sertac Karaman,et al.  Self-Supervised Sparse-to-Dense: Self-Supervised Depth Completion from LiDAR and Monocular Camera , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[18]  Thomas Brox,et al.  Sparsity Invariant CNNs , 2017, 2017 International Conference on 3D Vision (3DV).

[19]  John M. Dolan,et al.  LIDAR and Monocular Camera Fusion: On-road Depth Completion for Autonomous Driving , 2019, 2019 IEEE Intelligent Transportation Systems Conference (ITSC).

[20]  Fawzi Nashashibi,et al.  Sparse and Dense Data with CNNs: Depth Completion and Semantic Segmentation , 2018, 2018 International Conference on 3D Vision (3DV).

[21]  Nassir Navab,et al.  Deeper Depth Prediction with Fully Convolutional Residual Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[22]  Raquel Urtasun,et al.  Learning Joint 2D-3D Representations for Depth Completion , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[23]  Rob Fergus,et al.  Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[24]  Yong Liu,et al.  Parse geometry from a line: Monocular depth estimation with partial laser observation , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[25]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).