CSPN++: Learning Context and Resource Aware Convolutional Spatial Propagation Networks for Depth Completion

Depth Completion deals with the problem of converting a sparse depth map to a dense one, given the corresponding color image. Convolutional spatial propagation network (CSPN) is one of the state-of-the-art (SoTA) methods of depth completion, which recovers structural details of the scene. In this paper, we propose CSPN++, which further improves its effectiveness and efficiency by learning adaptive convolutional kernel sizes and the number of iterations for the propagation, thus the context and computational resources needed at each pixel could be dynamically assigned upon requests. Specifically, we formulate the learning of the two hyper-parameters as an architecture selection problem where various configurations of kernel sizes and numbers of iterations are first defined, and then a set of soft weighting parameters are trained to either properly assemble or select from the pre-defined configurations at each pixel. In our experiments, we find weighted assembling can lead to significant accuracy improvements, which we referred to as "context-aware CSPN", while weighted selection, "resource-aware CSPN" can reduce the computational resource significantly with similar or better accuracy. Besides, the resource needed for CSPN++ can be adjusted w.r.t. the computational budget automatically. Finally, to avoid the side effects of noise or inaccurate sparse depths, we embed a gated network inside CSPN++, which further improves the performance. We demonstrate the effectiveness of CSPN++on the KITTI depth completion benchmark, where it significantly improves over CSPN and other SoTA methods.

[1]  Luc Van Gool,et al.  Sparse and Noisy LiDAR Completion with RGB Guidance and Uncertainty , 2019, 2019 16th International Conference on Machine Vision Applications (MVA).

[2]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Jianxiong Xiao,et al.  DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[5]  Jie Tang,et al.  Learning Guided Convolutional Network for Depth Completion , 2019, IEEE Transactions on Image Processing.

[6]  Kun Yu,et al.  DenseASPP for Semantic Segmentation in Street Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Michael Felsberg,et al.  Confidence Propagation through CNNs for Guided Sparse Depth Regression , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Raquel Urtasun,et al.  Learning Joint 2D-3D Representations for Depth Completion , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Song Han,et al.  ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware , 2018, ICLR.

[11]  Yong Liu,et al.  Parse geometry from a line: Monocular depth estimation with partial laser observation , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[12]  Steven Lake Waslander,et al.  In Defense of Classical Image Processing: Fast Depth Completion on the CPU , 2018, 2018 15th Conference on Computer and Robot Vision (CRV).

[13]  Ruigang Yang,et al.  Depth Estimation via Affinity Learned with Convolutional Spatial Propagation Network , 2018, ECCV.

[14]  Sergio Guadarrama,et al.  Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Tobias Höllerer,et al.  Depth compositing for augmented reality , 2008, SIGGRAPH '08.

[16]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Alan L. Yuille,et al.  SURGE: Surface Regularized Geometry Estimation from a Single Image , 2016, NIPS.

[18]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Daniel Cremers,et al.  LSD-SLAM: Large-Scale Direct Monocular SLAM , 2014, ECCV.

[21]  Nassir Navab,et al.  Deeper Depth Prediction with Fully Convolutional Residual Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[22]  Alex Graves,et al.  Adaptive Computation Time for Recurrent Neural Networks , 2016, ArXiv.

[23]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[24]  Tomás Svoboda,et al.  Learning for Active 3D Mapping , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[25]  Xiaoxiao Li,et al.  Not All Pixels Are Equal: Difficulty-Aware Semantic Segmentation via Deep Layer Cascade , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Avinash C. Kak,et al.  Vision for Mobile Robot Navigation: A Survey , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Yinda Zhang,et al.  Deep Depth Completion of a Single RGB-D Image , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Ruigang Yang,et al.  Learning Depth with Convolutional Spatial Propagation Network , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  George Papandreou,et al.  Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[30]  Germán Ros,et al.  CARLA: An Open Urban Driving Simulator , 2017, CoRL.

[31]  Stephen Lin,et al.  Deformable ConvNets V2: More Deformable, Better Results , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Dong Liu,et al.  Deep High-Resolution Representation Learning for Human Pose Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Kilian Q. Weinberger,et al.  Multi-Scale Dense Networks for Resource Efficient Image Classification , 2017, ICLR.

[34]  Marc Pollefeys,et al.  From Point Clouds to Mesh Using Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[35]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[36]  M. Pollefeys,et al.  DeepLiDAR: Deep Surface Normal Guided Depth Prediction for Outdoor Scene From Sparse LiDAR Data and Single Color Image , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Xin Wang,et al.  SkipNet: Learning Dynamic Routing in Convolutional Networks , 2017, ECCV.

[38]  Sertac Karaman,et al.  Self-Supervised Sparse-to-Dense: Self-Supervised Depth Completion from LiDAR and Monocular Camera , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[39]  Thomas Brox,et al.  Sparsity Invariant CNNs , 2017, 2017 International Conference on 3D Vision (3DV).

[40]  Xiaojin Gong,et al.  Guided Depth Enhancement via Anisotropic Diffusion , 2013, PCM.

[41]  Horst Bischof,et al.  Image Guided Depth Upsampling Using Anisotropic Total Generalized Variation , 2013, 2013 IEEE International Conference on Computer Vision.

[42]  Feng Yan,et al.  EPNAS: Efficient Progressive Neural Architecture Search , 2019, BMVC.

[43]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.