Represent, reduce, classify: The essential stages for scene recognition

In this paper, scene recognition problem is investigated in detail by exploiting scene representation, dimension reduction and classification stages. Unlike the other studies, the proposed algorithm has preferred to model the overall structure of the scene instead of an object-based proposal. For that purpose, some of the visual representations like MPEG-7, Gist, BoW, Vlad and Fisher, are classified singly or jointly with Support Vector Machine and Random Forest. The evaluation tests are conducted on MIT indoor dataset [2] and from the results, %31 average precision has been attained by combining Scalable Color, Homogeneous Texture and Vlad with Support Vector Machine.

[1]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[2]  Antonio Torralba,et al.  Recognizing indoor scenes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  B. S. Manjunath,et al.  Introduction to MPEG-7: Multimedia Content Description Interface , 2002 .

[4]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[5]  A. Aydin Alatan,et al.  Multimodal concept detection in broadcast media: KavTan , 2013, Multimedia Tools and Applications.

[6]  Alexei A. Efros,et al.  Unsupervised Discovery of Mid-Level Discriminative Patches , 2012, ECCV.

[7]  Hao Su,et al.  Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification , 2010, NIPS.

[8]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[9]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[10]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[11]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[12]  Cordelia Schmid,et al.  Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.