Learning Joint 2D-3D Representations for Depth Completion

In this paper, we tackle the problem of depth completion from RGBD data. Towards this goal, we design a simple yet effective neural network block that learns to extract joint 2D and 3D features. Specifically, the block consists of two domain-specific sub-networks that apply 2D convolution on image pixels and continuous convolution on 3D points, with their output features fused in image space. We build the depth completion network simply by stacking the proposed block, which has the advantage of learning hierarchical representations that are fully fused between 2D and 3D spaces at multiple levels. We demonstrate the effectiveness of our approach on the challenging KITTI depth completion benchmark and show that our approach outperforms the state-of-the-art.

[1]  Germán Ros,et al.  CARLA: An Open Urban Driving Simulator , 2017, CoRL.

[2]  M. Pollefeys,et al.  DeepLiDAR: Deep Surface Normal Guided Depth Prediction for Outdoor Scene From Sparse LiDAR Data and Single Color Image , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Ce Liu,et al.  Depth Transfer: Depth Extraction from Video Using Non-Parametric Sampling , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Sertac Karaman,et al.  Self-Supervised Sparse-to-Dense: Self-Supervised Depth Completion from LiDAR and Monocular Camera , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[5]  Marc Pollefeys,et al.  Semantically Guided Depth Upsampling , 2016, GCPR.

[6]  Klaus Diepold,et al.  Dense disparity maps from sparse disparity measurements , 2011, 2011 International Conference on Computer Vision.

[7]  Ji Wan,et al.  Multi-view 3D Object Detection Network for Autonomous Driving , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Thomas Brox,et al.  Sparsity Invariant CNNs , 2017, 2017 International Conference on 3D Vision (3DV).

[9]  Meng Wang,et al.  2D-to-3D image conversion by learning depth from examples , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[10]  Nassir Navab,et al.  Deeper Depth Prediction with Fully Convolutional Residual Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[11]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[12]  Raquel Urtasun,et al.  Deep Parametric Continuous Convolutional Neural Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Heiko Hirschmüller,et al.  Stereo Processing by Semiglobal Matching and Mutual Information , 2008, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Anelia Angelova,et al.  Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Ashutosh Saxena,et al.  Learning Depth from Single Monocular Images , 2005, NIPS.

[16]  Sinisa Todorovic,et al.  Monocular Depth Estimation Using Neural Regression Forest , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Jianxiong Xiao,et al.  Sliding Shapes for 3D Object Detection in Depth Images , 2014, ECCV.

[18]  Simon Lucey,et al.  Learning Depth from Monocular Videos Using Direct Methods , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Michael Felsberg,et al.  Confidence Propagation through CNNs for Guided Sparse Depth Regression , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[22]  Wilfried Philips,et al.  Learning Morphological Operators for Depth Completion , 2018, ACIVS.

[23]  Jianxiong Xiao,et al.  Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Luc Van Gool,et al.  Sparse and Noisy LiDAR Completion with RGB Guidance and Uncertainty , 2019, 2019 16th International Conference on Machine Vision Applications (MVA).

[25]  Ian D. Reid,et al.  Learning Depth from Single Monocular Images Using Deep Convolutional Neural Fields , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Sertac Karaman,et al.  Sparse-to-Dense: Depth Prediction from Sparse Depth Samples and a Single Image , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[27]  Danfei Xu,et al.  PointFusion: Deep Sensor Fusion for 3D Bounding Box Estimation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Stefano Soatto,et al.  Dense Depth Posterior (DDP) From Single Image and Sparse Range , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Thomas A. Funkhouser,et al.  Semantic Scene Completion from a Single Depth Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Bin Yang,et al.  Deep Continuous Fusion for Multi-sensor 3D Object Detection , 2018, ECCV.

[32]  Noah Snavely,et al.  Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Qiao Wang,et al.  VirtualWorlds as Proxy for Multi-object Tracking Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Xuming He,et al.  Discrete-Continuous Depth Estimation from a Single Image , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Truong Q. Nguyen,et al.  Depth Reconstruction From Sparse Samples: Representation, Algorithm, and Sampling , 2014, IEEE Transactions on Image Processing.

[37]  Ruigang Yang,et al.  Depth Estimation via Affinity Learned with Convolutional Spatial Propagation Network , 2018, ECCV.

[38]  Fawzi Nashashibi,et al.  Sparse and Dense Data with CNNs: Depth Completion and Semantic Segmentation , 2018, 2018 International Conference on 3D Vision (3DV).

[39]  Silvio Savarese,et al.  DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Gustavo Carneiro,et al.  Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue , 2016, ECCV.