Multi-context Deep Convolutional Features and Exemplar-SVMs for Scene Parsing

Scene parsing is a challenging task in computer vision field. The work of scene parsing is labeling every pixel in an image with its semantic category to which it belongs. In this paper, we solve this problem by proposing an approach that combines the multi-context deep convolutional features with exemplar-SVMs for scene parsing. A convolutional neural network is employed to learn the multi-context deep features which include image global features and local features. In contrast to hand-crafted feature extraction approaches, the convolutional neural network learns features automatically and the features can better describe images on the task. In order to obtain a high class recognition accuracy, our system consists of the exemplar-SVMs which is training a linear SVM classifier for every exemplar in the training set for classification. Finally, multiple cues are integrated into a Markov Random Field framework to infer the parsing result. We apply our system to two challenging datasets, SIFT Flow dataset and the dataset which is collected by ourselves. The experimental results demonstrate that our method can achieve good performance.

[1]  Rob Fergus,et al.  Nonparametric image parsing using adaptive neighbor sets , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Junwei Han,et al.  Scene parsing using inference Embedded Deep Networks , 2016, Pattern Recognit..

[3]  Gang Wang,et al.  Scene Parsing With Integration of Parametric and Non-Parametric Models , 2016, IEEE Transactions on Image Processing.

[4]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Alexei A. Efros,et al.  Ensemble of exemplar-SVMs for object detection and beyond , 2011, 2011 International Conference on Computer Vision.

[6]  Song-Chun Zhu,et al.  Single-View 3D Scene Reconstruction and Parsing by Attribute Grammar , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Marian George,et al.  Image parsing with a wide range of classes and scene-level context , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Antonio Torralba,et al.  Nonparametric scene parsing: Label transfer via dense scene alignment , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[10]  Bolei Zhou,et al.  Open Vocabulary Scene Parsing , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  Pascal Fua,et al.  SLIC Superpixels Compared to State-of-the-Art Superpixel Methods , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[13]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[14]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[15]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[16]  Ming-Hsuan Yang,et al.  Context Driven Scene Parsing with Attention to Rare Classes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Qinping Zhao,et al.  Partial similarity based nonparametric scene parsing in certain environment , 2011, CVPR 2011.

[18]  Svetlana Lazebnik,et al.  Superparsing , 2010, International Journal of Computer Vision.

[19]  Svetlana Lazebnik,et al.  Finding Things: Image Parsing with Regions and Per-Exemplar Detectors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Yann LeCun,et al.  Scene parsing with Multiscale Feature Learning, Purity Trees, and Optimal Covers , 2012, ICML.

[22]  Ronan Collobert,et al.  Recurrent Convolutional Neural Networks for Scene Labeling , 2014, ICML.

[23]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.