Scene Recognition by Joint Learning of DNN from Bag of Visual Words and Convolutional DCT Features

ABSTRACT Scene recognition is used in many computer vision and related applications, including information retrieval, robotics, real-time monitoring, and event-classification. Due to the complex nature of the task of scene recognition, it has been greatly improved by deep learning architectures that can be trained by utilizing large and comprehensive datasets. This paper presents a scene classification method in which local and global features are used and are concatenated with the DCT-Convolutional features of AlexNet. The features are fed into AlexNet's fully connected layers for classification. The local and global features are made efficient by selecting the correct size of Bag of Visual Words (BOVW) and feature selection techniques, which are evaluated in the experimentation section. We used AlexNet with the modification of adding additional dense fully connected layers and compared its result with the model previously trained on the Places365 dataset. Our model is also compared with other scene recognition methods, and it clearly outperforms in terms of accuracy.

[1]  Sameer Singh,et al.  Indoor vs. outdoor scene classification in digital photographs , 2005, Pattern Recognit..

[2]  T. Kanade,et al.  Color information for region segmentation , 1980 .

[3]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[4]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[5]  Jianqin Zhou,et al.  On discrete cosine transform , 2011, ArXiv.

[6]  E. B. Wilson,et al.  The Distribution of Chi-Square. , 1931, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Martin Szummer,et al.  Indoor-outdoor image classification , 1998, Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database.

[8]  Jiebo Luo,et al.  Indoor vs outdoor classification of consumer photographs using low-level and semantic features , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[9]  Dewen Hu,et al.  Scene classification using a multi-resolution bag-of-features model , 2013, Pattern Recognit..

[10]  Chong-Wah Ngo,et al.  Evaluating bag-of-visual-words representations in scene classification , 2007, MIR '07.

[11]  supR. Raja,et al.  Classification of Scenes into Indoor/Outdoor , 2014 .

[12]  Aude Oliva,et al.  Classification of scene photographs from local orientations features , 2000, Pattern Recognit. Lett..

[13]  Ajith Abraham,et al.  Texture classification based on DCT and soft computing , 2001, 10th IEEE International Conference on Fuzzy Systems. (Cat. No.01CH37297).

[14]  V. Kalaichelvi,et al.  Image classification using bag of visual words model with FAST and FREAK , 2017, 2017 Second International Conference on Electrical, Computer and Communication Technologies (ICECCT).

[15]  Rama Chellappa,et al.  Deep feature extraction in the DCT domain , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[17]  Huchuan Lu,et al.  LCNN: Low-level Feature Embedded CNN for Salient Object Detection , 2015, ArXiv.

[18]  J. Gower Properties of Euclidean and non-Euclidean distance matrices , 1985 .

[19]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[20]  Rozenn Dahyot,et al.  On using CNN with DCT based Image Data , 2017 .

[21]  Ioannis Pratikakis,et al.  Bag of spatio-visual words for context inference in scene classification , 2013, Pattern Recognit..

[22]  Raimondo Schettini,et al.  Improving Color Constancy Using Indoor–Outdoor Image Classification , 2008, IEEE Transactions on Image Processing.

[23]  Vincent Kanade,et al.  Clustering Algorithms , 2021, Wireless RF Energy Transfer in the Massive IoT Era.

[24]  Bolei Zhou,et al.  Places: An Image Database for Deep Scene Understanding , 2016, ArXiv.

[25]  Lin Wu,et al.  Bag of Visual Words Model with Deep Spatial Features for Geographical Scene Classification , 2017, Comput. Intell. Neurosci..

[26]  Wonjun Kim,et al.  A Novel Method for Efficient Indoor–Outdoor Image Classification , 2010, J. Signal Process. Syst..

[27]  Liping Wang,et al.  Image Classification Algorithm Based on Sparse Coding , 2014, J. Multim..

[28]  Akiko Aizawa,et al.  An information-theoretic perspective of tf-idf measures , 2003, Inf. Process. Manag..

[29]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[30]  Jiebo Luo,et al.  A computationally efficient approach to indoor/outdoor scene classification , 2002, Object recognition supported by user interaction for service robots.