Combining Descriptors Extracted from Feature Maps of Deconvolutional Networks and SIFT Descriptors in Scene Image Classification

This paper presents a new method to combine descriptors extracted from feature maps of Deconvolutional Networks and SIFT descriptors by converting them into histograms of local patterns, so the concatenation operation can be applied and ensure to increase the classification rate. We use K-means clustering algorithm to construct codebooks and compute Spatial Histograms to represent the distribution of local patterns in an image. Consequently, we can concatenate these histograms to make a new one that represents more local patterns than the originals. In the classification step, SVM associated with Histogram Intersection Kernel is utilized. In the experiments on Scene-15 Dataset containing 15 categories, the classification rates of our method are around 84% which outperforms Reconfigurable Bag-of-Words (RBoW), Sparse Covariance Patterns (SCP), Spatial Pyramid Matching (SPM), Spatial Pyramid Matching using Sparse Coding (ScSPM) and Visual Word Reweighting (VWR).

[1]  Seong-Whan Lee,et al.  Biologically Motivated Computer Vision , 2002, Lecture Notes in Computer Science.

[2]  Chong-Wah Ngo,et al.  Representations of Keypoint-Based Semantic Concept Detection: A Comprehensive Study , 2010, IEEE Transactions on Multimedia.

[3]  Simone Santini,et al.  Similarity Measures , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  James M. Rehg,et al.  Learning sparse covariance patterns for natural scenes , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[6]  Andrew Zisserman,et al.  Image Classification using Random Forests and Ferns , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[7]  Joachim M. Buhmann,et al.  Empirical evaluation of dissimilarity measures for color and texture , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[8]  David G. Lowe,et al.  Towards a Computational Model for Object Recognition in IT Cortex , 2000, Biologically Motivated Computer Vision.

[9]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[10]  Qi Tian,et al.  Image Classification Using Spatial Pyramid Coding and Visual Word Reweighting , 2010, ACCV.

[11]  David R. Bull,et al.  Projective image restoration using sparsity regularization , 2013, 2013 IEEE International Conference on Image Processing.

[12]  Chong-Wah Ngo,et al.  Evaluating bag-of-visual-words representations in scene classification , 2007, MIR '07.

[13]  Amarnath Gupta,et al.  Virage video engine , 1997, Electronic Imaging.

[14]  Christos Faloutsos,et al.  Efficient and effective Querying by Image Content , 1994, Journal of Intelligent Information Systems.

[15]  Radim Sára,et al.  A Weak Structure Model for Regular Pattern Recognition Applied to Facade Images , 2010, ACCV.

[16]  B. S. Manjunath,et al.  NeTra: A toolbox for navigating large image databases , 1997, Proceedings of International Conference on Image Processing.

[17]  Graham W. Taylor,et al.  Deconvolutional networks , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Pedro F. Felzenszwalb,et al.  Reconfigurable models for scene recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Antonio Torralba,et al.  Context-based vision system for place and object recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[20]  Trevor Darrell,et al.  The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.