Scene recognition by semantic visual words

In this paper, we propose a novel approach to introduce semantic relations into the bag-of-words framework. We use the latent semantic models, such as latent semantic analysis (LSA) and probabilistic latent semantic analysis (pLSA), in order to define semantically rich features and embed the visual features into a semantic space. The semantic features used in LSA technique are derived from the low-rank approximation of word–image occurrence matrix by singular value decomposition. Similarly, by using the pLSA approach, the topic-specific distributions of words can be considered dimensions of a concept space. In the proposed space, the distances between words represent the semantic distances which are used for constructing a discriminative and semantically meaningful vocabulary. Position information significantly improves scene recognition accuracy. Inspired by this, in this paper, we bring position information into the proposed semantic vocabulary frameworks. We have tested our approach on the 15-Scene and 67-MIT Indoor datasets and have achieved very promising results.

[1]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Jean Ponce,et al.  Local, semi-local and global models for texture, object and scene recognition , 2006 .

[3]  Jitendra Malik,et al.  Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories , 2006 .

[5]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[6]  Andrew Zisserman,et al.  Scene Classification Via pLSA , 2006, ECCV.

[7]  Luc Van Gool,et al.  Modeling scenes with local descriptors and latent aspects , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[8]  Mubarak Shah,et al.  Scene Modeling Using Co-Clustering , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[9]  Antonio Torralba,et al.  Recognizing indoor scenes , 2009, CVPR.

[10]  Alexei A. Efros,et al.  Discovering objects and their location in images , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[11]  Frédéric Jurie,et al.  Fast Discriminative Visual Codebooks using Randomized Clustering Forests , 2006, NIPS.

[12]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[13]  Svetlana Lazebnik,et al.  Scene recognition and weakly supervised object localization with deformable part-based models , 2011, 2011 International Conference on Computer Vision.

[14]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[15]  Nuno Vasconcelos,et al.  Scene Recognition on the Semantic Manifold , 2012, ECCV.

[16]  HofmannThomas Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2001 .

[17]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[18]  Xin Li,et al.  An Object Co-occurrence Assisted Hierarchical Model for Scene Understanding , 2012, BMVC.

[19]  Nicholas Roy,et al.  Indoor scene recognition through object detection , 2010, 2010 IEEE International Conference on Robotics and Automation.

[20]  Tsuhan Chen,et al.  Unsupervised Image Categorization and Object Localization using Topic Models and Correspondences between Images , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[21]  Wanqing Li,et al.  Incorporating local and global information using a novel distance function for scene recognition , 2013, 2013 IEEE Workshop on Robot Vision (WORV).

[22]  Andrew Zisserman,et al.  A Statistical Approach to Texture Classification from Single Images , 2004, International Journal of Computer Vision.

[23]  Francesca Odone,et al.  Building kernels from binary strings for image matching , 2005, IEEE Transactions on Image Processing.

[24]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[25]  Hao Su,et al.  Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification , 2010, NIPS.

[26]  Michael J. Swain,et al.  Color indexing , 1991, International Journal of Computer Vision.

[27]  Roberto Cipolla,et al.  Semantic texton forests for image categorization and segmentation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[29]  Martin Szummer,et al.  Indoor-outdoor image classification , 1998, Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database.

[30]  James M. Rehg,et al.  CENTRIST: A Visual Descriptor for Scene Categorization , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Trevor Darrell,et al.  The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[32]  Frédéric Jurie,et al.  Sampling Strategies for Bag-of-Features Image Classification , 2006, ECCV.

[33]  Dima Damen,et al.  Detecting Carried Objects in Short Video Sequences , 2008, ECCV.

[34]  Mubarak Shah,et al.  Learning human actions via information maximization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Deepu Rajan,et al.  Embedding Visual Words into Concept Space for Action and Scene Recognition , 2010, BMVC.

[36]  Andrew Zisserman,et al.  Scene Classification Using a Hybrid Generative/Discriminative Approach , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Alexei A. Efros,et al.  Unsupervised Discovery of Mid-Level Discriminative Patches , 2012, ECCV.

[38]  Yang Yang,et al.  Learning semantic visual vocabularies using diffusion distance , 2009, CVPR.

[39]  Bernt Schiele,et al.  Natural Scene Retrieval Based on a Semantic Modeling Step , 2004, CIVR.

[40]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[41]  Pedro F. Felzenszwalb,et al.  Reconfigurable models for scene recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.