Unsupervised detail-preserving network for high quality monocular depth estimation

Abstract In this paper, we propose an unsupervised learning framework to address the problems of the inaccurate inference of depth details and the loss of spatial information for monocular depth estimation. First, as an unsupervised technique, the proposed framework takes easily collected stereo image pairs instead of ground truth depth data as inputs for training. Second, we design a rectangle convolution to capture global dependencies between neighboring pixels across entire rows or columns in an image, which can bring significant promotion on depth details inference. Third, we propose a learned depth refinement module including a color-guided refinement layer and a learned composite proximal operator to preserve depth discontinuities and obtain high quality depth map. The proposed network is fully differentiable and end-to-end trainable. Extensive experiments evaluated on KITTI, Cityscapes and Make3D dataset demonstrate our state-of-the-art performance and good cross-dataset generalization ability.

[1]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[2]  Shunli Zhang,et al.  Monocular depth estimation with guidance of surface normal map , 2017, Neurocomputing.

[3]  Ruigang Yang,et al.  Depth Estimation via Affinity Learned with Convolutional Spatial Propagation Network , 2018, ECCV.

[4]  Guosheng Lin,et al.  Deep convolutional neural fields for depth estimation from a single image , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Wei Xu,et al.  Unsupervised Learning of Geometry with Edge-aware Depth-Normal Consistency , 2017, AAAI.

[6]  Takayuki Okatani,et al.  Revisiting Single Image Depth Estimation: Toward Higher Resolution Maps With Accurate Object Boundaries , 2018, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[7]  Ashutosh Saxena,et al.  3-D Depth Reconstruction from a Single Still Image , 2007, International Journal of Computer Vision.

[8]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[9]  Silong Peng,et al.  Sharp image estimation from a depth-involved motion-blurred image , 2016, Neurocomputing.

[10]  Hu Tian,et al.  Depth estimation with convolutional conditional random field network , 2016, Neurocomputing.

[11]  Xiaogang Wang,et al.  Spatial As Deep: Spatial CNN for Traffic Scene Understanding , 2017, AAAI.

[12]  Stanley H. Chan,et al.  Plug-and-Play ADMM for Image Restoration: Fixed-Point Convergence and Applications , 2016, IEEE Transactions on Computational Imaging.

[13]  Antonin Chambolle,et al.  A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging , 2011, Journal of Mathematical Imaging and Vision.

[14]  Gustavo Carneiro,et al.  Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue , 2016, ECCV.

[15]  Vincent Dumoulin,et al.  Deconvolution and Checkerboard Artifacts , 2016 .

[16]  Yunjin Chen,et al.  Trainable Nonlinear Reaction Diffusion: A Flexible Framework for Fast and Effective Image Restoration , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Tieniu Tan,et al.  High quality depth map estimation of object surface from light-field images , 2017, Neurocomputing.

[18]  Robert D. Nowak,et al.  Majorization–Minimization Algorithms for Wavelet-Based Image Restoration , 2007, IEEE Transactions on Image Processing.

[19]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[20]  Marc Pollefeys,et al.  Pulling Things out of Perspective , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[22]  B. Mercier,et al.  A dual algorithm for the solution of nonlinear variational problems via finite element approximation , 1976 .

[23]  Nassir Navab,et al.  Deeper Depth Prediction with Fully Convolutional Residual Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[24]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[25]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[26]  Christopher Joseph Pal,et al.  The Importance of Skip Connections in Biomedical Image Segmentation , 2016, LABELS/DLMIA@MICCAI.

[27]  Kwanghoon Sohn,et al.  Structure Selective Depth Superresolution for RGB-D Cameras , 2016, IEEE Transactions on Image Processing.

[28]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Donald Geman,et al.  Nonlinear image recovery with half-quadratic regularization , 1995, IEEE Trans. Image Process..

[30]  Nicu Sebe,et al.  Unsupervised Adversarial Depth Estimation Using Cycled Generative Networks , 2018, 2018 International Conference on 3D Vision (3DV).