论文信息 - GeoNet++: Iterative Geometric Neural Network with Edge-Aware Refinement for Joint Depth and Surface Normal Estimation

GeoNet++: Iterative Geometric Neural Network with Edge-Aware Refinement for Joint Depth and Surface Normal Estimation

In this paper, we propose a geometric neural network with edge-aware refinement (GeoNet++) to jointly predict both depth and surface normal maps from a single image. Building on top of two-stream CNNs, GeoNet++ captures the geometric relationships between depth and surface normals with the proposed depth-to-normal and normal-to-depth modules. In particular, the "depth-to-normal" module exploits the least square solution of estimating surface normals from depth to improve their quality, while the "normal-to-depth" module refines the depth map based on the constraints on surface normals through kernel regression. Boundary information is exploited via an edge-aware refinement module. GeoNet++ effectively predicts depth and surface normals with high 3D consistency and sharp boundaries resulting in better reconstructed 3D scenes. Note that GeoNet++ is generic and can be used in other depth/normal prediction frameworks to improve 3D reconstruction quality and pixel-wise accuracy of depth and surface normals. Furthermore, we propose a new 3D geometric metric (3DGM) for evaluating depth prediction in 3D. In contrast to current metrics that focus on evaluating pixel-wise error/accuracy, 3DGM measures whether the predicted depth can reconstruct high-quality 3D surface normals. This is a more natural metric for many 3D application domains. Our experiments on NYUD-V2 and KITTI demonstrate the effectiveness of our approach.

[1] Stephen Gould,et al. Single image depth estimation from predicted semantic labels , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2] Sanja Fidler,et al. Box in the Box: Joint 3D Layout and Object Reasoning from Single Images , 2013, 2013 IEEE International Conference on Computer Vision.

[3] Yinda Zhang,et al. Deep Depth Completion of a Single RGB-D Image , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4] Manuel Menezes de Oliveira Neto,et al. Domain transform for edge-aware image and video processing , 2011, ACM Trans. Graph..

[5] Iasonas Kokkinos,et al. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[6] Li Xu,et al. Break Ames room illusion , 2015, ACM Trans. Graph..

[7] Stefano Soatto,et al. A geometric approach to shape from defocus , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8] Nassir Navab,et al. Deeper Depth Prediction with Fully Convolutional Residual Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[9] Dacheng Tao,et al. Deep Ordinal Regression Network for Monocular Depth Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10] Nicu Sebe,et al. PAD-Net: Multi-tasks Guided Prediction-and-Distillation Network for Simultaneous Depth Estimation and Scene Parsing , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11] Xuming He,et al. Discrete-Continuous Depth Estimation from a Single Image , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[12] Ce Liu,et al. Depth Extraction from Video Using Non-parametric Sampling , 2012, ECCV.

[13] Alan L. Yuille,et al. SURGE: Surface Regularized Geometry Estimation from a Single Image , 2016, NIPS.

[14] Jonathan T. Barron,et al. Semantic Image Segmentation with Task-Specific Edge Detection Using CNNs and a Discriminatively Trained Domain Transform , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Antonio Torralba,et al. Depth Estimation from Image Structure , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[16] Wei Xu,et al. Unsupervised Learning of Geometry with Edge-aware Depth-Normal Consistency , 2017, AAAI.

[17] Andreas Geiger,et al. Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[18] Martial Hebert,et al. Unfolding an Indoor Origami World , 2014, ECCV.

[19] Marc Pollefeys,et al. Pulling Things out of Perspective , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[20] Martial Hebert,et al. Data-Driven 3D Primitives for Single Image Understanding , 2013, 2013 IEEE International Conference on Computer Vision.

[21] Xinlei Chen,et al. PixelNet: Representation of the pixels, by the pixels, and for the pixels , 2017, ArXiv.

[22] Nicu Sebe,et al. Multi-scale Continuous CRFs as Sequential Deep Networks for Monocular Depth Estimation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[24] Bolei Zhou,et al. Single Image Intrinsic Decomposition Without a Single Intrinsic Image , 2018, ECCV.

[25] Jonathan T. Barron,et al. Scene Intrinsics and Depth from a Single Image , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[26] Thomas Brox,et al. Sparsity Invariant CNNs , 2017, 2017 International Conference on 3D Vision (3DV).

[27] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[28] Dani Lischinski,et al. Colorization using optimization , 2004, ACM Trans. Graph..

[29] Alan L. Yuille,et al. Towards unified depth and semantic prediction from a single image , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Xiang Li,et al. Joint Task-Recursive Learning for Semantic Segmentation and Depth Estimation , 2018, ECCV.

[31] Ashutosh Saxena,et al. Learning Depth from Single Monocular Images , 2005, NIPS.

[32] Derek Hoiem,et al. Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[33] Wei Liu,et al. ParseNet: Looking Wider to See Better , 2015, ArXiv.

[34] Xiaogang Wang,et al. Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35] Gregory Shakhnarovich,et al. Depth from a Single Image by Harmonizing Overcomplete Local Network Predictions , 2016, NIPS.

[36] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[37] Sinisa Todorovic,et al. Monocular Depth Estimation Using Neural Regression Forest , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38] Abhinav Gupta,et al. Designing deep networks for surface normal estimation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Jörg Stückler,et al. Semi-Supervised Deep Learning for Monocular Depth Map Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40] Lorenzo Torresani,et al. Coupled depth learning , 2015, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[41] Alexei A. Efros,et al. Recovering Surface Layout from an Image , 2007, International Journal of Computer Vision.

[42] Rob Fergus,et al. Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[43] Ian D. Reid,et al. Learning Depth from Single Monocular Images Using Deep Convolutional Neural Fields , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44] Xuming He,et al. Indoor scene structure analysis for single image depth estimation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45] Jitendra Malik,et al. Shape, Illumination, and Reflectance from Shading , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46] Nicu Sebe,et al. Monocular Depth Estimation Using Multi-Scale Continuous CRFs as Sequential Deep Networks , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47] Renjie Liao,et al. GeoNet: Geometric Neural Network for Joint Depth and Surface Normal Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[48] Marc Pollefeys,et al. Discriminatively Trained Dense Surface Normal Estimation , 2014, ECCV.

[49] Oisin Mac Aodha,et al. Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50] Chunhua Shen,et al. Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51] Abhinav Gupta,et al. Marr Revisited: 2D-3D Alignment via Surface Normal Prediction , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52] Gustavo Carneiro,et al. Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue , 2016, ECCV.

[53] Rob Fergus,et al. Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.