Nonparametric Scene Parsing via Label Transfer

While there has been a lot of recent work on object recognition and image understanding, the focus has been on carefully establishing mathematical models for images, scenes, and objects. In this paper, we propose a novel, nonparametric approach for object recognition and scene parsing using a new technology we name label transfer. For an input image, our system first retrieves its nearest neighbors from a large database containing fully annotated images. Then, the system establishes dense correspondences between the input image and each of the nearest neighbors using the dense SIFT flow algorithm [28], which aligns two images based on local image structures. Finally, based on the dense scene correspondences obtained from SIFT flow, our system warps the existing annotations and integrates multiple cues in a Markov random field framework to segment and recognize the query image. Promising experimental results have been achieved by our nonparametric scene parsing system on challenging databases. Compared to existing object recognition approaches that require training classifiers or appearance models for each object category, our system is easy to implement, has few parameters, and embeds contextual information naturally in the retrieval/alignment procedure.

[1]  Patrick J. F. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 2003 .

[2]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[3]  Timothy F. Cootes,et al.  Face Recognition Using Active Appearance Models , 1998, ECCV.

[4]  P. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 1999 .

[5]  Pietro Perona,et al.  Unsupervised Learning of Models for Recognition , 2000, ECCV.

[6]  Jitendra Malik,et al.  Shape Context: A New Descriptor for Shape Matching and Object Recognition , 2000, NIPS.

[7]  Edward H. Adelson,et al.  On seeing stuff: the perception of materials by humans and machines , 2001, IS&T/SPIE Electronic Imaging.

[8]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[9]  Baining Guo,et al.  Real-time texture synthesis by patch-based sampling , 2001, TOGS.

[10]  Trevor Darrell,et al.  Fast pose estimation with parameter-sensitive hashing , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[11]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[12]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[13]  Antonio Torralba,et al.  Using the Forest to See the Trees: A Graphical Model Relating Features, Objects, and Scenes , 2003, NIPS.

[14]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[15]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[16]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[17]  Antonio Criminisi,et al.  Object categorization by learned universal visual dictionary , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[18]  Jitendra Malik,et al.  Shape matching and object recognition using low distortion correspondences , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[19]  Trevor Darrell,et al.  The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[20]  Daniel P. Huttenlocher,et al.  Spatial priors for part-based recognition using statistical models , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[21]  Antonio Torralba,et al.  Describing Visual Scenes using Transformed Dirichlet Processes , 2005, NIPS.

[22]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[23]  Stepán Obdrzálek,et al.  Sub-linear Indexing for Large Scale Object Recognition , 2005, BMVC.

[24]  Silvio Savarese,et al.  Discriminative Object Class Models of Appearance and Shape by Correlatons , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[25]  Alexei A. Efros,et al.  Putting Objects in Perspective , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[26]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[27]  Sung Yong Shin,et al.  On pixel-based texture synthesis by non-parametric sampling , 2006, Comput. Graph..

[28]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[29]  Jitendra Malik,et al.  Image Retrieval and Classification Using Local Distance Functions , 2006, NIPS.

[30]  Trevor Darrell,et al.  Pyramid Match Kernels: Discriminative Classification with Sets of Image Features (version 2) , 2006 .

[31]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[32]  Antonio Criminisi,et al.  TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context , 2007, International Journal of Computer Vision.

[33]  Antonio Torralba,et al.  Object Recognition by Scene Alignment , 2007, NIPS.

[34]  Eli Shechtman,et al.  Matching Local Self-Similarities across Images and Videos , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Alexei A. Efros,et al.  Scene completion using millions of photographs , 2007, SIGGRAPH 2007.

[36]  Andrea Vedaldi,et al.  Objects in Context , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[37]  Daphne Koller,et al.  Learning Spatial Context: Using Stuff to Find Things , 2008, ECCV.

[38]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[39]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Larry S. Davis,et al.  Beyond Nouns: Exploiting Prepositions and Comparative Adjectives for Learning Visual Classifiers , 2008, ECCV.

[41]  Antonio Torralba,et al.  SIFT Flow: Dense Correspondence across Different Scenes , 2008, ECCV.

[42]  Alexei A. Efros,et al.  Segmenting Scenes by Matching Image Composites , 2009, NIPS.

[43]  Alexei A. Efros,et al.  An empirical study of context in object detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Charless C. Fowlkes,et al.  Discriminative Models for Multi-Class Object Layout , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[45]  Antonio Torralba,et al.  Nonparametric scene parsing: Label transfer via dense scene alignment , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Yi Yang,et al.  Layered object detection for multi-class segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[47]  Charless C. Fowlkes,et al.  Multiresolution Models for Object Detection , 2010, ECCV.

[48]  Gert R. G. Lanckriet,et al.  Multi-class object localization by combining local contextual interactions , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[49]  Antonio Torralba,et al.  Exploiting hierarchical context on a large database of object categories , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[50]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[51]  Svetlana Lazebnik,et al.  Superparsing , 2010, International Journal of Computer Vision.

[52]  Pietro Perona,et al.  Visual Recognition with Humans in the Loop , 2010, ECCV.

[53]  Antonio Torralba,et al.  SIFT Flow: Dense Correspondence across Scenes and Its Applications , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  V. Kshirsagar,et al.  Face recognition using Eigenfaces , 2011, 2011 3rd International Conference on Computer Research and Development.