Single image depth prediction using super-column super-pixel features

Depth prediction from a single monocular image is a challenging yet valuable task, as often a depth sensor is not available. The state-of-the-art approach [1] combines a deep fully convolutional network (DFCN) with a conditional random field (CRF), allowing the CRF to correct and smooth the depth values estimated by the DFCN according to efficient contextual modeling. However, using the output of the DFCN as unary input for CRF is limited by using only the last layer of the DFCN. The middle layers of the DFCN have been shown to carry useful information for other scene understanding tasks, which may help to improve the prediction quality. This paper proposes a novel super-column superpixel (SCSP) feature that is the combination of multiple layers of the DFCN after a super-pixel pooling process. The proposed approach based on the SCSP features reduces the root mean square (rms) error of the prediction by more than 16% in NYUv2 dataset.

[1]  Jonathan Tompson,et al.  Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation , 2014, NIPS.

[2]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[3]  Xuming He,et al.  Discrete-Continuous Depth Estimation from a Single Image , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Jitendra Malik,et al.  Hypercolumns for object segmentation and fine-grained localization , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Ronan Collobert,et al.  Recurrent Convolutional Neural Networks for Scene Labeling , 2014, ICML.

[6]  Ian D. Reid,et al.  Learning Depth from Single Monocular Images Using Deep Convolutional Neural Fields , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Jianbo Shi,et al.  Semantic Segmentation with Boundary Neural Fields , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Victor S. Lempitsky,et al.  N4-Fields: Neural Network Nearest Neighbor Fields for Image Transforms , 2014, ArXiv.

[9]  Luca Maria Gambardella,et al.  Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images , 2012, NIPS.

[10]  Yann LeCun,et al.  Pedestrian Detection with Unsupervised Multi-stage Feature Learning , 2012, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Xiaoou Tang,et al.  Learning a Deep Convolutional Network for Image Super-Resolution , 2014, ECCV.

[12]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[13]  Peter Robinson,et al.  Continuous Conditional Neural Fields for Structured Regression , 2014, ECCV.

[14]  Rob Fergus,et al.  Restoring an Image Taken through a Window Covered with Dirt or Rain , 2013, 2013 IEEE International Conference on Computer Vision.

[15]  Jianbo Shi,et al.  High-for-Low and Low-for-High: Efficient Boundary Detection from Deep Object Features and Its Applications to High-Level Vision , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[16]  Ashutosh Saxena,et al.  Learning Depth from Single Monocular Images , 2005, NIPS.

[17]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.