Scene parsing by nonparametric label transfer of content-adaptive windows

CollageParsing is a scene parsing algorithm that matches content-adaptive windows.Unlike superpixels, content-adaptive windows are designed to preserve objects.A powerful MRF unary is constructed by performing label transfer using the windows.Gains of 15-19% average per-class accuracy are obtained on a standard benchmark. Scene parsing is the task of labeling every pixel in an image with its semantic category. We present CollageParsing, a nonparametric scene parsing algorithm that performs label transfer by matching content-adaptive windows. Content-adaptive windows provide a higher level of perceptual organization than superpixels, and unlike superpixels are designed to preserve entire objects instead of fragmenting them. Performing label transfer using content-adaptive windows enables the construction of a more effective Markov random field unary potential than previous approaches. On a standard benchmark consisting of outdoor scenes from the LabelMe database, CollageParsing obtains state-of-the-art performance with 15-19% higher average per-class accuracy than recent nonparametric scene parsing algorithms.

[1]  Vladimir Kolmogorov,et al.  An experimental comparison of min-cut/max- flow algorithms for energy minimization in vision , 2001, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Antonio Criminisi,et al.  TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation , 2006, ECCV.

[3]  Daniel P. Huttenlocher,et al.  Efficient Graph-Based Image Segmentation , 2004, International Journal of Computer Vision.

[4]  Derek Hoiem,et al.  Category-Independent Object Proposals with Diverse Ranking , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[6]  Eli Shechtman,et al.  In defense of Nearest-Neighbor based image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Koen E. A. van de Sande,et al.  Segmentation as selective search for object recognition , 2011, 2011 International Conference on Computer Vision.

[8]  Antonio Torralba,et al.  Nonparametric Scene Parsing via Label Transfer , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Thomas Deselaers,et al.  Measuring the Objectness of Image Windows , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[11]  Antonio Torralba,et al.  SIFT Flow: Dense Correspondence across Scenes and Its Applications , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Heesoo Myeong,et al.  Learning object relationships via graph-based context model , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Cordelia Schmid,et al.  Object Recognition by Integrating Multiple Image Segmentations , 2008, ECCV.

[14]  Vittorio Ferrari,et al.  Figure-ground segmentation by transferring window masks , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  James J. Little,et al.  CollageParsing: Nonparametric Scene Parsing by Adaptive Overlapping Windows , 2014, ECCV.

[16]  Pushmeet Kohli,et al.  Graph Cut Based Inference with Co-occurrence Statistics , 2010, ECCV.

[17]  Rob Fergus,et al.  Nonparametric image parsing using adaptive neighbor sets , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Stephen Gould,et al.  PatchMatchGraph: Building a Graph of Dense Patch Correspondences for Label Transfer , 2012, ECCV.

[19]  Jana Kosecka,et al.  Nonparametric Scene Parsing with Adaptive Feature Relevance and Semantic Context , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Olga Veksler,et al.  Fast Approximate Energy Minimization via Graph Cuts , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  David G. Lowe,et al.  Spatially Local Coding for Object Recognition , 2012, ACCV.

[22]  Eli Shechtman,et al.  PatchMatch: a randomized correspondence algorithm for structural image editing , 2009, ACM Trans. Graph..

[23]  Trevor Darrell,et al.  The NBNN kernel , 2011, 2011 International Conference on Computer Vision.

[24]  Svetlana Lazebnik,et al.  Finding Things: Image Parsing with Regions and Per-Exemplar Detectors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[26]  Ali Shahrokni,et al.  Mesh Based Semantic Modelling for Indoor and Outdoor Scenes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Pushmeet Kohli,et al.  Associative hierarchical CRFs for object class image segmentation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[28]  Cristian Sminchisescu,et al.  CPMC: Automatic Object Segmentation Using Constrained Parametric Min-Cuts , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Jonathan T. Barron,et al.  Multiscale Combinatorial Grouping , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Bernt Schiele,et al.  How good are detection proposals, really? , 2014, BMVC.

[31]  Pushmeet Kohli,et al.  Robust Higher Order Potentials for Enforcing Label Consistency , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[33]  Stephen Gould,et al.  Decomposing a scene into geometric and semantically consistent regions , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[34]  Vladimir Kolmogorov,et al.  What energy functions can be minimized via graph cuts? , 2002, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Cristian Sminchisescu,et al.  Composite Statistical Inference for Semantic Segmentation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  C. Lawrence Zitnick,et al.  Structured Forests for Fast Edge Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[37]  Alexei A. Efros,et al.  Ensemble of exemplar-SVMs for object detection and beyond , 2011, 2011 International Conference on Computer Vision.

[38]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[39]  Liqing Zhang,et al.  Saliency Detection: A Spectral Residual Approach , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Svetlana Lazebnik,et al.  Superparsing - Scalable Nonparametric Image Parsing with Superpixels , 2010, International Journal of Computer Vision.

[41]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[42]  Nikos Komodakis,et al.  Markov Random Field modeling, inference & learning in computer vision & image understanding: A survey , 2013, Comput. Vis. Image Underst..

[43]  Ming-Hsuan Yang,et al.  Context Driven Scene Parsing with Attention to Rare Classes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Yann LeCun,et al.  Scene parsing with Multiscale Feature Learning, Purity Trees, and Optimal Covers , 2012, ICML.

[45]  Daphne Koller,et al.  Learning Spatial Context: Using Stuff to Find Things , 2008, ECCV.

[46]  Vladlen Koltun,et al.  Geodesic Object Proposals , 2014, ECCV.