Context Driven Scene Parsing with Attention to Rare Classes

This paper presents a scalable scene parsing algorithm based on image retrieval and superpixel matching. We focus on rare object classes, which play an important role in achieving richer semantic understanding of visual scenes, compared to common background classes. Towards this end, we make two novel contributions: rare class expansion and semantic context description. First, considering the long-tailed nature of the label distribution, we expand the retrieval set by rare class exemplars and thus achieve more balanced superpixel classification results. Second, we incorporate both global and local semantic context information through a feedback based mechanism to refine image retrieval and superpixel matching. Results on the SIFTflow and LMSun datasets show the superior performance of our algorithm, especially on the rare classes, without sacrificing overall labeling accuracy.

[1]  Sergey Ioffe,et al.  A Hierarchical Conditional Random Field Model for Labeling and Images of Street Scenes , 2011 .

[2]  Svetlana Lazebnik,et al.  Superparsing - Scalable Nonparametric Image Parsing with Superpixels , 2010, International Journal of Computer Vision.

[3]  Jana Kosecka,et al.  Nonparametric Scene Parsing with Adaptive Feature Relevance and Semantic Context , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Alexei A. Efros,et al.  Ensemble of exemplar-SVMs for object detection and beyond , 2011, 2011 International Conference on Computer Vision.

[5]  Olga Veksler,et al.  Fast Approximate Energy Minimization via Graph Cuts , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Ali Farhadi,et al.  Building a dictionary of image fragments , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Rob Fergus,et al.  Nonparametric image parsing using adaptive neighbor sets , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Pushmeet Kohli,et al.  Associative hierarchical CRFs for object class image segmentation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[9]  Zhuowen Tu,et al.  Auto-Context and Its Application to High-Level Vision Tasks and 3D Brain Image Segmentation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Hao Su,et al.  Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification , 2010, NIPS.

[11]  Yann LeCun,et al.  Scene parsing with Multiscale Feature Learning, Purity Trees, and Optimal Covers , 2012, ICML.

[12]  Antonio Torralba,et al.  Nonparametric Scene Parsing via Label Transfer , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[14]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[15]  Andrea Vedaldi,et al.  Objects in Context , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[16]  Mei Han,et al.  A hierarchical conditional random field model for labeling and segmenting images of street scenes , 2011, CVPR 2011.

[17]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[18]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[19]  Svetlana Lazebnik,et al.  Finding Things: Image Parsing with Regions and Per-Exemplar Detectors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[21]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22]  Hannu Oja,et al.  Classification Based on Hybridization of Parametric and Nonparametric Classifiers , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Antonio Criminisi,et al.  TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context , 2007, International Journal of Computer Vision.

[24]  Ying Wu,et al.  Object retrieval and localization with spatially-constrained similarity measure and k-NN re-ranking , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Pushmeet Kohli,et al.  Graph Cut Based Inference with Co-occurrence Statistics , 2010, ECCV.

[26]  Daniel P. Huttenlocher,et al.  Efficient Graph-Based Image Segmentation , 2004, International Journal of Computer Vision.