Semantic segmentation of RGBD images based on deep depth regression

Abstract Depth information has been discovered to improve the performance of computer vision tasks, such as semantic segmentation and object recognition. However, careful acquisition of depth data needs highly developed depth sensors which are expensive. As a classic computer vision task, depth estimation from a single image has obtained promising results based on supervised learning methods. In this paper, we investigate the extension of color images with corresponding deep-regressed depth images in boosting the performance of semantic segmentation. Furthermore, the usage of combining color channels with the estimated depth or the ground truth depth channel is compared. Specifically, there are two stages in our work. Firstly, we adopt the framework of convolutional neural networks (CNN) for the depth estimation by combing the global depth network and the depth gradient network. After refining based on these two networks, the depth image map can be estimated in a deep-regressed manner. Secondly, after augmenting the color images with the predicted depth images, fully convolutional networks (FCN) are further used to implement the pixel-level semantic labeling. In the experiments, we employ two popular RGBD datasets, i.e., SUNRGBD and NYUDv2, for 37 and 40-class semantic segmentation, respectively. By comparing with the ground truth depth images, experimental results demonstrate that the networks trained on the estimated depth images can achieve comparable performance on facilitating the accuracy of semantic segmentation task.

[1]  Xindong Wu,et al.  Learning on Big Graph: Label Inference and Regularization with Anchor Hierarchy , 2017, IEEE Transactions on Knowledge and Data Engineering.

[2]  Xiao Liu,et al.  Stacked deep polynomial network based representation learning for tumor classification with small ultrasound image dataset , 2016, Neurocomputing.

[3]  Sinisa Todorovic,et al.  Monocular Depth Estimation Using Neural Regression Forest , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Dinggang Shen,et al.  Scalable joint segmentation and registration framework for infant brain images , 2017, Neurocomputing.

[5]  Heng Tao Shen,et al.  Exploiting Depth From Single Monocular Images for Object Detection and Semantic Segmentation , 2016, IEEE Transactions on Image Processing.

[6]  Dinggang Shen,et al.  Hierarchical unbiased graph shrinkage (HUGS): A novel groupwise registration for large data set , 2014, NeuroImage.

[7]  Rob Fergus,et al.  Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[8]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[9]  Xinlei Chen,et al.  PixelNet: Representation of the pixels, by the pixels, and for the pixels , 2017, ArXiv.

[10]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[11]  Meng Wang,et al.  Spatially guided local Laplacian filter for nature image detail enhancement , 2014, Multimedia Tools and Applications.

[12]  Shihui Ying,et al.  Histopathological Image Classification With Color Pattern Random Binary Hashing-Based PCANet and Matrix-Form Classifier , 2017, IEEE Journal of Biomedical and Health Informatics.

[13]  Ashutosh Saxena,et al.  Learning Depth from Single Monocular Images , 2005, NIPS.

[14]  Stephen Gould,et al.  Single image depth estimation from predicted semantic labels , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  Jitendra Malik,et al.  Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Zi Huang,et al.  A Sparse Embedding and Least Variance Encoding Approach to Hashing , 2014, IEEE Transactions on Image Processing.

[17]  Shichao Zhang,et al.  Robust Joint Graph Sparse Coding for Unsupervised Spectral Feature Selection , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[18]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Xuelong Li,et al.  Block-Row Sparse Multiview Multilabel Learning for Image Classification , 2016, IEEE Transactions on Cybernetics.

[20]  Dinggang Shen,et al.  Building dynamic population graph for accurate correspondence detection , 2015, Medical Image Anal..

[21]  汪萌,et al.  Image Annotation By Multiple-Instance Learning With Discriminative Feature Mapping and Selection , 2014 .

[22]  Alan L. Yuille,et al.  Towards unified depth and semantic prediction from a single image , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Markus Vincze,et al.  Enhancing Semantic Segmentation for Robotics: The Power of 3-D Entangled Forests , 2016, IEEE Robotics and Automation Letters.

[24]  Shihui Ying,et al.  Multimodal Neuroimaging Feature Learning With Multimodal Stacked Deep Polynomial Networks for Diagnosis of Alzheimer's Disease , 2018, IEEE Journal of Biomedical and Health Informatics.

[25]  Ian D. Reid,et al.  Learning Depth from Single Monocular Images Using Deep Convolutional Neural Fields , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Nanning Zheng,et al.  Exact solution to median surface problem using 3D graph search and application to parameter space exploration , 2015, Pattern Recognit..

[27]  Ke Li,et al.  Probability iterative closest point algorithm for m-D point set registration with noise , 2015, Neurocomputing.

[28]  Zhen Li,et al.  LSTM-CF: Unifying Context Modeling and Fusion with LSTMs for RGB-D Scene Labeling , 2016, ECCV.

[29]  Zi Huang,et al.  Linear cross-modal hashing for efficient multimedia search , 2013, ACM Multimedia.

[30]  Zhijie Wen,et al.  Manifold Preserving: An Intrinsic Approach for Semisupervised Distance Metric Learning , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[31]  Gregory D. Hager,et al.  Hierarchical semantic parsing for object pose estimation in densely cluttered scenes , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[32]  Gustavo Carneiro,et al.  Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue , 2016, ECCV.