Semantic Segmentation of Indoor-Scene RGB-D Images Based on Iterative Contraction and Merging

In this paper, we propose an iterative contraction and merging framework (ICM) for semantic segmentation in indoor scenes. Given an input image and a raw depth image, we first derive the dense prediction map from a convolutional neural network (CNN) and a normal vector map from the depth image. By combining the RGB-D image with these two maps, we can guide the ICM process to produce a more accurate hierarchical segmentation tree in a bottom-up manner. After that, based on the hierarchical segmentation tree, we design a decision process which uses the dense prediction map as a reference to make the final decision of semantic segmentation. Experimental results show that the proposed method can generate much more accurate object boundaries if compared to the state-of-the-art methods.

[1]  Dieter Fox,et al.  RGB-(D) scene labeling: Features and algorithms , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[3]  Jitendra Malik,et al.  Learning Rich Features from RGB-D Images for Object Detection and Segmentation , 2014, ECCV.

[4]  Charless C. Fowlkes,et al.  Contour Detection and Hierarchical Image Segmentation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Jia-Hao Syu,et al.  Hierarchical Image Segmentation Based on Iterative Contraction and Merging , 2017, IEEE Transactions on Image Processing.