A scene recognition method using sparse features with layout-sensitive pooling and extreme learning machine

Scene recognition aims to find a semantic explanation of a scene, i.e., it helps intelligent machines to know where they are. It can be widely applied into various tasks in computer vision and robotics. Most of pioneer methods extracted a set of low-level features and put them into classifier directly to identify scene category. But it has been proved that low-level features do not work well. Currently researchers aim to overcome the semantic gap between the low-level vision features and high-level semantic categories to improve the recognition performance. Therefore, much attention has been put on transforming low-level descriptors into richer intermediate representations. This paper proposed a novel method based on intermediate feature representation to solve the problem of recognizing the semantic category of scene image. This proposed method uses sparse coding on SIFT features and presents a spatial layout sensitive pooling method. The space layout for pooling is based on three rectangles with size of 1∗1,1∗4 and 4∗1 in each image. They are derived from inherent characteristics of the scene images by regularly dividing the image in horizontal and vertical direction. This spatial pooling strategy is easier and it can get optimal representation of scene images. Extreme learning machine (ELM) is used as a classifier. ELM has shown great ability to fit nonlinear classification boundaries. Experimental results have shown that this proposed method not only extracts lower dimension image feature but also outperforms other similar state-of-the-art methods in terms of recognition performance.

[1]  James M. Rehg,et al.  Beyond the Euclidean distance: Creating effective visual codebooks using the Histogram Intersection Kernel , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[2]  Wenyu Liu,et al.  Feature context for image classification and object detection , 2011, CVPR 2011.

[3]  Jean Ponce,et al.  Learning mid-level features for recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  Hongming Zhou,et al.  Extreme Learning Machine for Regression and Multiclass Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[5]  Matthieu Cord,et al.  Learning Deep Hierarchical Visual Feature Coding , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[6]  Xiaoqiang Lu,et al.  Scene Recognition by Manifold Regularized Deep Learning Architecture , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[7]  Bingbing Ni,et al.  Geometric ℓp-norm feature pooling for image classification , 2011, CVPR 2011.

[8]  Shawn D. Newsam,et al.  Spatial pyramid co-occurrence for image classification , 2011, 2011 International Conference on Computer Vision.

[9]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[10]  Cor J. Veenman,et al.  Visual Word Ambiguity , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Deyu Meng,et al.  Towards Efficient Learning of Optimal Spatial Bag-of-Words Representations , 2014, ICMR.

[12]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  Tieniu Tan,et al.  Salient coding for image classification , 2011, CVPR 2011.

[14]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[15]  Frédéric Jurie,et al.  Visual word disambiguation by semantic contexts , 2011, 2011 International Conference on Computer Vision.

[16]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[18]  Jean Ponce,et al.  A graph-matching kernel for object categorization , 2011, 2011 International Conference on Computer Vision.

[19]  Lei Wang,et al.  In defense of soft-assignment coding , 2011, 2011 International Conference on Computer Vision.

[20]  Yasuo Kuniyoshi,et al.  Discriminative spatial pyramid , 2011, CVPR 2011.

[21]  Nicolas Le Roux,et al.  Ask the locals: Multi-way local pooling for image recognition , 2011, 2011 International Conference on Computer Vision.

[22]  Michael Elad,et al.  Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries , 2006, IEEE Transactions on Image Processing.

[23]  Jorma Laaksonen,et al.  Spatial extensions to bag of visual words , 2009, CIVR '09.

[24]  Gaurav Sharma,et al.  Learning discriminative spatial representation for image classification , 2011, BMVC.

[25]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.