Spatiotemporal Guided Self-Supervised Depth Completion from LiDAR and Monocular Camera

Depth completion aims to estimate dense depth maps from sparse depth measurements. It has become increasingly important in autonomous driving and thus has drawn wide attention. In this paper, we introduce photometric losses in both spatial and time domains to jointly guide self-supervised depth completion. This method performs an accurate end-to-end depth completion of vision tasks by using LiDAR and a monocular camera. In particular, we full utilize the consistent information inside the temporally adjacent frames and the stereo vision to improve the accuracy of depth completion in the model training phase. We design a self-supervised framework to eliminate the negative effects of moving objects and the region with smooth gradients. Experiments are conducted on KITTI. Results indicate that our self-supervised method can attain competitive performance.

[1]  Aly A. Farag,et al.  CSIFT: A SIFT Descriptor with Color Invariant Characteristics , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[2]  V. Lepetit,et al.  EPnP: An Accurate O(n) Solution to the PnP Problem , 2009, International Journal of Computer Vision.

[3]  Andrew J. Davison,et al.  DTAM: Dense tracking and mapping in real-time , 2011, 2011 International Conference on Computer Vision.

[4]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Gustavo Carneiro,et al.  Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue , 2016, ECCV.

[6]  Noah Snavely,et al.  Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Thomas Brox,et al.  Sparsity Invariant CNNs , 2017, 2017 International Conference on 3D Vision (3DV).

[9]  Steven Lake Waslander,et al.  In Defense of Classical Image Processing: Fast Depth Completion on the CPU , 2018, 2018 15th Conference on Computer and Robot Vision (CRV).

[10]  Ruigang Yang,et al.  DeLS-3D: Deep Localization and Segmentation with a 3D Semantic Map , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Leonidas J. Guibas,et al.  Frustum PointNets for 3D Object Detection from RGB-D Data , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Dongbing Gu,et al.  UnDeepVO: Monocular Visual Odometry Through Unsupervised Deep Learning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[13]  Marc Pollefeys,et al.  Robust Dense Mapping for Large-Scale Dynamic Environments , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Sertac Karaman,et al.  Self-Supervised Sparse-to-Dense: Self-Supervised Depth Completion from LiDAR and Monocular Camera , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[15]  Camillo J. Taylor,et al.  DFuseNet: Deep Fusion of RGB and Sparse Depth Information for Image Guided Dense Depth Completion , 2019, 2019 IEEE Intelligent Transportation Systems Conference (ITSC).