Scene Parsing with Deep Features and Spatial Structure Learning

Conditional Random Field (CRF) is a powerful tool for labeling tasks, and has always played a key role in object recognition and semantic segmentation. However, the quality of CRF labeling depends on selected features, which becomes the bottleneck of the accuracy improvement. In this paper, our semantic segmentation problem is calculated in the same way within the framework of Conditional Random Field. Different from other CRF-based strategies, which use appearance features of image, revealing only little information, we combined our framework together with deep learning strategy, such as Convolutional Neural Networks (CNNs), for feature extraction, which have shown strong ability and remarkable performance. This combination strategy is called deep-feature CRF (dCRF). Through dCRF, the deep informantion of image is illustrated and gets ultilized, and the segmentation accuracy is also increased. The proposed deep CRF strategy is adopted on SIFT-Flow and VOC2007 datasets. The segmentation results reveals that if we use features learned from deep networks into our CRF framework, the performance of our semantic segmentation strategy would increase significantly.

[1]  Pascal Fua,et al.  SLIC Superpixels Compared to State-of-the-Art Superpixel Methods , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[3]  Guosheng Lin,et al.  CRF Learning with CNN Features for Image Segmentation , 2015, Pattern Recognit..

[4]  Jana Kosecka,et al.  Semantic parsing for priming object detection in indoors RGB-D scenes , 2015, Int. J. Robotics Res..

[5]  Antonio Criminisi,et al.  TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context , 2007, International Journal of Computer Vision.

[6]  Rob Fergus,et al.  Nonparametric image parsing using adaptive neighbor sets , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Antonio Torralba,et al.  Nonparametric Scene Parsing via Label Transfer , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Ronan Collobert,et al.  Recurrent Convolutional Neural Networks for Scene Labeling , 2014, ICML.

[9]  Jun Zhu,et al.  Learning From Weakly Supervised Data by The Expectation Loss SVM (e-SVM) algorithm , 2014, NIPS.

[10]  Honglak Lee,et al.  Augmenting CRFs with Boltzmann Machine Shape Priors for Image Labeling , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Stan Z. Li,et al.  Markov Random Field Modeling in Image Analysis , 2001, Computer Science Workbench.

[13]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[14]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Jana Kosecka,et al.  Nonparametric Scene Parsing with Adaptive Feature Relevance and Semantic Context , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Horst Bischof,et al.  Editorial Special Issue ECCV 2006 , 2008, International Journal of Computer Vision.

[17]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[18]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[19]  Jana Kosecka,et al.  Semantic segmentation with heterogeneous sensor coverages , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[20]  Svetlana Lazebnik,et al.  Superparsing , 2010, International Journal of Computer Vision.