论文信息 - To Complete or to Estimate, That is the Question: A Multi-Task Approach to Depth Completion and Monocular Depth Estimation

To Complete or to Estimate, That is the Question: A Multi-Task Approach to Depth Completion and Monocular Depth Estimation

Robust three-dimensional scene understanding is now an ever-growing area of research highly relevant in many real-world applications such as autonomous driving and robotic navigation. In this paper, we propose a multi-task learning-based model capable of performing two tasks:- sparse depth completion (i.e. generating complete dense scene depth given a sparse depth image as the input) and monocular depth estimation (i.e. predicting scene depth from a single RGB image) via two sub-networks jointly trained end to end using data randomly sampled from a publicly available corpus of synthetic and real-world images. The first sub-network generates a sparse depth image by learning lower level features from the scene and the second predicts a full dense depth image of the entire scene, leading to a better geometric and contextual understanding of the scene and, as a result, superior performance of the approach. The entire model can be used to infer complete scene depth from a single RGB image or the second network can be used alone to perform depth completion given a sparse depth input. Using adversarial training, a robust objective function, a deep architecture relying on skip connections and a blend of synthetic and real-world training data, our approach is capable of producing superior high quality scene depth. Extensive experimental evaluation demonstrates the efficacy of our approach compared to contemporary state-of-the-art techniques across both problem domains.

Toby P. Breckon | Amir Atapour Abarghouei

[1] Ian D. Reid,et al. Learning Depth from Single Monocular Images Using Deep Convolutional Neural Fields , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2] Sertac Karaman,et al. Sparse-to-Dense: Depth Prediction from Sparse Depth Samples and a Single Image , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[3] Heiko Hirschmüller,et al. Stereo Processing by Semiglobal Matching and Mutual Information , 2008, IEEE Trans. Pattern Anal. Mach. Intell..

[4] Alois Knoll,et al. PM-Huber: PatchMatch with Huber Regularization for Stereo Matching , 2013, 2013 IEEE International Conference on Computer Vision.

[5] Derek Hoiem,et al. Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[6] Michael Felsberg,et al. Propagating Confidences through CNNs for Sparse Data Regression , 2018, BMVC.

[7] Pascal Vincent,et al. Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[8] Andrew Zisserman,et al. Spatial Transformer Networks , 2015, NIPS.

[9] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[10] Thomas Brox,et al. Generating Images with Perceptual Similarity Metrics based on Deep Networks , 2016, NIPS.

[11] Jitendra Malik,et al. Depth from shading, defocus, and correspondence using light-field angular coherence , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Alexei A. Efros,et al. Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Michael Felsberg,et al. Confidence Propagation through CNNs for Guided Sparse Depth Regression , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14] Rob Fergus,et al. Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[15] Mandar Kulkarni,et al. Depth inpainting by tensor voting. , 2013, Journal of the Optical Society of America. A, Optics, image science, and vision.

[16] Toby P. Breckon,et al. Generative adversarial framework for depth filling via Wasserstein metric, cosine transform and domain transfer , 2019, Pattern Recognit..

[17] Antonio Torralba,et al. SIFT Flow: Dense Correspondence across Scenes and Its Applications , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18] Toby P. Breckon,et al. Veritatem Dies Aperit - Temporally Consistent Depth Prediction Enabled by a Multi-Task Geometric and Semantic Scene Understanding Approach , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19] Xuming He,et al. Discrete-Continuous Depth Estimation from a Single Image , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[20] Toby P. Breckon,et al. Real-Time Monocular Depth Estimation Using Synthetic Data with Domain Adaptation via Image Style Transfer , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21] Richard Szeliski,et al. A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, International Journal of Computer Vision.

[22] Toby P. Breckon,et al. Back to Butterworth - a Fourier basis for 3D surface relief hole filling within RGB-D imagery , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[23] Yoshimitsu Aoki,et al. Depth image enhancement using local tangent plane approximations , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[25] Yinda Zhang,et al. Deep Depth Completion of a Single RGB-D Image , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26] Alexei A. Efros,et al. Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Ce Liu,et al. Depth Transfer: Depth Extraction from Video Using Non-Parametric Sampling , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28] Bastian Leibe,et al. Exploring Spatial Context for 3D Semantic Segmentation of Point Clouds , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[29] Toby P. Breckon,et al. Extended Patch Prioritization for Depth Filling Within Constrained Exemplar-Based RGB-D Image Completion , 2018, ICIAR.

[30] Ian D. Reid,et al. Unsupervised Learning of Monocular Depth Estimation and Visual Odometry with Deep Feature Reconstruction , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31] Sertac Karaman,et al. Self-Supervised Sparse-to-Dense: Self-Supervised Depth Completion from LiDAR and Monocular Camera , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[32] Toby P. Breckon,et al. Monocular Segment-Wise Depth: Monocular Depth Estimation Based on a Semantic Segmentation Prior , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[33] Stephen Gould,et al. Single image depth estimation from predicted semantic labels , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[34] Deng Cai,et al. Depth Image Inpainting: Improving Low Rank Matrix Completion With Low Gradient Regularization , 2017, IEEE Transactions on Image Processing.

[35] Yoshua Bengio,et al. Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[36] Ashutosh Saxena,et al. Learning Depth from Single Monocular Images , 2005, NIPS.

[37] Luc Van Gool,et al. Sparse and Noisy LiDAR Completion with RGB Guidance and Uncertainty , 2019, 2019 16th International Conference on Machine Vision Applications (MVA).

[38] Jörg Stückler,et al. Semi-Supervised Deep Learning for Monocular Depth Map Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Minh N. Do,et al. Semantic Image Inpainting with Deep Generative Models , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40] Roberto Cipolla,et al. Multi-task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41] Noah Snavely,et al. Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42] Toby P. Breckon,et al. A comparative review of plausible hole filling strategies in the context of scene depth image completion , 2018, Comput. Graph..

[43] Simon Lucey,et al. Deep Convolutional Compressed Sensing for LiDAR Depth Completion , 2018, ACCV.

[44] Ali Farhadi,et al. Deep3D: Fully Automatic 2D-to-3D Video Conversion with Deep Convolutional Neural Networks , 2016, ECCV.

[45] Jianfei Cai,et al. T2Net: Synthetic-to-Realistic Translation for Solving Single-Image Depth Estimation Tasks , 2018, ECCV.

[46] Thomas Brox,et al. Sparsity Invariant CNNs , 2017, 2017 International Conference on 3D Vision (3DV).

[47] Dacheng Tao,et al. Deep Ordinal Regression Network for Monocular Depth Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[48] Robert J. Woodham,et al. Photometric method for determining surface orientation from multiple images , 1980 .

[49] Xaq Pitkow,et al. Skip Connections Eliminate Singularities , 2017, ICLR.

[50] Oisin Mac Aodha,et al. Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[52] Gaurav Sharma,et al. Fusing structure from motion and lidar for dense accurate depth map estimation , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[53] Chunhua Shen,et al. Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54] Dacheng Tao,et al. Geometry-Aware Symmetric Domain Adaptation for Monocular Depth Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[55] Andreas Geiger,et al. Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[56] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57] Jianhua Wang,et al. An improved edge detection algorithm for depth map inpainting , 2014 .

[58] Nassir Navab,et al. Deeper Depth Prediction with Fully Convolutional Residual Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[59] Rob Fergus,et al. Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[60] Camillo J. Taylor,et al. DFuseNet: Deep Fusion of RGB and Sparse Depth Information for Image Guided Dense Depth Completion , 2019, 2019 IEEE Intelligent Transportation Systems Conference (ITSC).

[61] Xuming He,et al. Building Scene Models by Completing and Hallucinating Depth and Semantics , 2016, ECCV.

[62] Soumith Chintala,et al. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[63] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .

[64] Jianxiong Xiao,et al. SUN RGB-D: A RGB-D scene understanding benchmark suite , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[65] Toby P. Breckon,et al. DepthComp: Real-time Depth Image Completion Based on Prior Semantic Scene Segmentation , 2017, BMVC.