Learning strengths and weaknesses of classifiers for RGB-D semantic segmentation

3D scene understanding is an open challenge in the field of computer vision. Most of the focus is on 2D methods in which the semantic labeling of each RGB pixel is considered. But, in this paper, the 3D semantic labeling of RGB-D images is considered. In the proposed method, to extract some meaningful features, the superpixel generation algorithm is applied to the RGB image to segment it into a set of disjoint pixels. After that, the set of three powerful classifiers are utilized to semantically label each superpixel. In the proposed method, the probability outputs of these classifiers are concatenated as the novel feature vector for each superpixel. Consequently, to analyze the strengths and weaknesses of each classifier, the conditional random field framework is used to improve the contextual relationships among neighboring superpixels. The unary potential function of the conditional random field is learned based on these new feature vectors. The proposed method is evaluated on the challenging NYU-V2 RGB-D dataset and improves the pixel average accuracy compared to previous methods.

[1]  Nathan Silberman,et al.  Indoor scene segmentation using a structured light sensor , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[2]  Jörg Stückler,et al.  Dense real-time mapping of object-class semantics from RGB-D video , 2013, Journal of Real-Time Image Processing.

[3]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Dieter Fox,et al.  RGB-(D) scene labeling: Features and algorithms , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Sven Behnke,et al.  Learning depth-sensitive conditional random fields for semantic segmentation of RGB-D images , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[6]  Jana Kosecka,et al.  Semantic parsing for priming object detection in indoors RGB-D scenes , 2015, Int. J. Robotics Res..

[7]  Antonio Criminisi,et al.  TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation , 2006, ECCV.

[8]  Pushmeet Kohli,et al.  Associative hierarchical CRFs for object class image segmentation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[9]  Ming-Hsuan Yang,et al.  Context Driven Scene Parsing with Attention to Rare Classes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Yann LeCun,et al.  Indoor Semantic Segmentation using depth information , 2013, ICLR.

[11]  Carsten Rother,et al.  Dense Semantic Image Segmentation with Objects and Attributes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Pascal Fua,et al.  SLIC Superpixels Compared to State-of-the-Art Superpixel Methods , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Jitendra Malik,et al.  Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Bill Triggs,et al.  Region Classification with Markov Field Aspect Models , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Miguel Á. Carreira-Perpiñán,et al.  Multiscale conditional random fields for image labeling , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[16]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[17]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[18]  César Cadena,et al.  Semantic Parsing for Priming Object Detection in RGB-D Scenes , 2013 .

[19]  Charless C. Fowlkes,et al.  Contour Detection and Hierarchical Image Segmentation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.