High Quality Depth Estimation from Monocular Images Based on Depth Prediction and Enhancement Sub-Networks

This paper addresses the problem of depth estimation from a single RGB image. Previous methods mainly focus on the problems of depth prediction accuracy and output depth resolution, but seldom of them can tackle these two problems well. Here, we present a novel depth estimation framework based on deep convolutional neural network (CNN) to learn the mapping between monocular images and depth maps. The proposed architecture can be divided into two components, i.e., depth prediction and depth enhancement sub-networks. We first design a depth prediction network based on the ResNet architecture to infer the scene depth from color image. Then, a depth enhancement network is concatenated to the end of the depth prediction network to obtain a high resolution depth map. Experimental results show that the proposed method outperforms other methods on benchmark RGB-D datasets and achieves state-of-the-art performance.

[1]  Nassir Navab,et al.  Deeper Depth Prediction with Fully Convolutional Residual Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[2]  Ashutosh Saxena,et al.  Learning Depth from Single Monocular Images , 2005, NIPS.

[3]  Ce Liu,et al.  Depth Transfer: Depth Extraction from Video Using Non-Parametric Sampling , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Jia Xu,et al.  Fast Image Processing with Fully-Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[5]  Ian D. Reid,et al.  Learning Depth from Single Monocular Images Using Deep Convolutional Neural Fields , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[7]  Yao Wang,et al.  Color-Guided Depth Recovery From RGB-D Data Using an Adaptive Autoregressive Model , 2014, IEEE Transactions on Image Processing.

[8]  Jingyu Yang,et al.  SPA: Sparse Photorealistic Animation Using a Single RGB-D Camera , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[9]  Rob Fergus,et al.  Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[10]  Reinhard Koch,et al.  Time‐of‐Flight Cameras in Computer Graphics , 2010, Comput. Graph. Forum.

[11]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[12]  Xuming He,et al.  Discrete-Continuous Depth Estimation from a Single Image , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Heiko Hirschmüller,et al.  Stereo Processing by Semiglobal Matching and Mutual Information , 2008, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[15]  Kwanghoon Sohn,et al.  Depth Analogy: Data-Driven Approach for Single Image Depth Estimation Using Gradient Samples , 2015, IEEE Transactions on Image Processing.

[16]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Chunhua Shen,et al.  Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.