Nonparametric scene parsing with deep convolutional features and dense alignment

This paper addresses two key issues which concern the performance of nonparametric scene parsing: (1) the semantic quality of image retrieval; and (2) the accuracy in label transfer. First, because nonparametric methods annotate a query image through transferring labels from retrieved images, the task of image retrieval should find a set of “semantically similar” images to the query. Second, with the retrieval set, a good strategy should be developed to transfer semantic labels in pixel-level accuracy. In this paper, we focus on improving scene parsing accuracy in these two issues. We propose using the state-of-the-art deep convolutional features as image descriptors to improve the semantic quality of retrieved images. In addition, we include dense alignment into the Markov Random Field inference framework to transfer labels at pixel-level accuracy. Our experiments on the SIFT Flow dataset shows the improvement of the proposed approach over other nonparametric methods.

[1]  Svetlana Lazebnik,et al.  Superparsing , 2010, International Journal of Computer Vision.

[2]  Svetlana Lazebnik,et al.  Finding Things: Image Parsing with Regions and Per-Exemplar Detectors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[4]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[5]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[6]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[7]  Antonio Torralba,et al.  Nonparametric Scene Parsing via Label Transfer , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Olga Veksler,et al.  Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[9]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[10]  Thomas Deselaers,et al.  Measuring the Objectness of Image Windows , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  James J. Little,et al.  CollageParsing: Nonparametric Scene Parsing by Adaptive Overlapping Windows , 2014, ECCV.

[12]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Ming-Hsuan Yang,et al.  Context Driven Scene Parsing with Attention to Rare Classes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.