Semantic and Spatial Content Fusion for Scene Recognition

In the field of scene recognition, it is usually insufficient to use only one visual feature regardless of how discriminative the feature is. Therefore, the spatial location and semantic relationships of local features need to be captured together with the scene contextual information. In this paper we proposed a novel framework to project image contextual feature space with semantic space of local features into a map function. This embedding is performed based on a subset of training images denoted as an exemplar-set. This exemplar-set is composed of images that describe better the scene category’s attributes than the other images. The proposed framework learns a weighted combination of local semantic topics as well as global and spatial information, where the weights represent the features’ contributions in each scene category. An empirical study was performed on two of the most challenging scene datasets 15-Scene categories and 67-Indoor Scenes and the promising results of 89.47 and 45.0 were achieved respectively.

[1]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[2]  Pedro F. Felzenszwalb,et al.  Reconfigurable models for scene recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[4]  Alexei A. Efros,et al.  Unsupervised Discovery of Mid-Level Discriminative Patches , 2012, ECCV.

[5]  D. T. Lee,et al.  Boosted Multiple Kernel Learning for Scene Category Recognition , 2010, 2010 20th International Conference on Pattern Recognition.

[6]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[7]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[8]  Yang Yang,et al.  Learning semantic visual vocabularies using diffusion distance , 2009, CVPR.

[9]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11]  Andrew Zisserman,et al.  Scene Classification Via pLSA , 2006, ECCV.

[12]  Luc Van Gool,et al.  Modeling scenes with local descriptors and latent aspects , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[13]  Alexei A. Efros,et al.  Discovering objects and their location in images , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[14]  Mubarak Shah,et al.  Scene Modeling Using Co-Clustering , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[15]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[16]  Matthieu Guillaumin,et al.  Segmentation Propagation in ImageNet , 2012, ECCV.

[17]  Antonio Torralba,et al.  Recognizing indoor scenes , 2009, CVPR.

[18]  Deepu Rajan,et al.  Embedding Visual Words into Concept Space for Action and Scene Recognition , 2010, BMVC.

[19]  Andrew Zisserman,et al.  Scene Classification Using a Hybrid Generative/Discriminative Approach , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[21]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[22]  Svetlana Lazebnik,et al.  Scene recognition and weakly supervised object localization with deformable part-based models , 2011, 2011 International Conference on Computer Vision.

[23]  Jan-Olof Eklundh,et al.  Computer Vision — ECCV '94 , 1994, Lecture Notes in Computer Science.

[24]  Ramin Zabih,et al.  Non-parametric Local Transforms for Computing Visual Correspondence , 1994, ECCV.

[25]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[26]  James M. Rehg,et al.  CENTRIST: A Visual Descriptor for Scene Categorization , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Trevor Darrell,et al.  The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[28]  Hao Su,et al.  Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification , 2010, NIPS.