Indoor Image Representation by High-Level Semantic Features

Indoor image features extraction is a fundamental problem in multiple fields such as image processing, pattern recognition, robotics, and so on. Nevertheless, most of the existing feature extraction methods, which extract features based on pixels, color, shape/object parts or objects on images, suffer from limited capabilities in describing semantic information (e.g., object association). These techniques, therefore, involve undesired classification performance. To tackle this issue, we propose the notion of high-level semantic features and design four steps to extract them. Specifically, we first construct the objects pattern dictionary through extracting raw objects in the images, and then retrieve and extract semantic objects from the objects pattern dictionary. We finally extract our high-level semantic features based on the calculated probability and delta parameter. The experiments on three publicly available datasets (MIT-67, Scene15, and NYU V1) show that our feature extraction approach outperforms the state-of-the-art feature extraction methods for indoor image classification, given a lower dimension of our features than those methods.

[1]  Fei-Fei Li,et al.  Large Margin Learning of Upstream Scene Understanding Models , 2010, NIPS.

[2]  Songyang Lao,et al.  Bag of Surrogate Parts Feature for Visual Recognition , 2018, IEEE Transactions on Multimedia.

[3]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[4]  Mohammed Bennamoun,et al.  A Discriminative Representation of Convolutional Features for Indoor Scene Recognition , 2015, IEEE transactions on image processing : a publication of the IEEE Signal Processing Society.

[5]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[6]  Lihi Zelnik-Manor,et al.  OTC: A Novel Local Descriptor for Scene Classification , 2014, ECCV.

[7]  Nathan Silberman,et al.  Indoor scene segmentation using a structured light sensor , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[8]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[9]  Antonio Torralba,et al.  Recognizing indoor scenes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Chun Chen,et al.  Low-level and high-level prior learning for visual saliency estimation , 2014, Inf. Sci..

[11]  Jiwen Lu,et al.  Scene recognition with objectness , 2018, Pattern Recognit..

[12]  Cewu Lu,et al.  Learning Important Spatial Pooling Regions for Scene Classification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Bowen Zhang,et al.  Weakly Supervised PatchNets: Describing and Aggregating Local Patches for Scene Recognition , 2016, IEEE Transactions on Image Processing.

[14]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[15]  Jie Lei,et al.  Finding intrinsic color themes in images with human visual perception , 2018, Neurocomputing.

[16]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[17]  Svetlana Lazebnik,et al.  Multi-scale Orderless Pooling of Deep Convolutional Activation Features , 2014, ECCV.

[18]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[19]  Aouatif Amine,et al.  Sift Descriptors Modeling and Application in Texture Image Classification , 2016, 2016 13th International Conference on Computer Graphics, Imaging and Visualization (CGiV).

[20]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[21]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[22]  Pedro F. Felzenszwalb,et al.  Reconfigurable models for scene recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[24]  Jianxin Wu,et al.  mCENTRIST: A Multi-Channel Feature Generation Mechanism for Scene Categorization , 2014, IEEE Transactions on Image Processing.

[25]  Limin Wang,et al.  Locally Supervised Deep Hybrid Model for Scene Recognition , 2016, IEEE Transactions on Image Processing.

[26]  Xuelong Li,et al.  Rank Preserving Sparse Learning for Kinect Based Scene Classification , 2013, IEEE Transactions on Cybernetics.

[27]  Mynepalli Siva Chaitanya,et al.  Hierarchical Deep Learning Architecture For 10K Objects Classification , 2015, ArXiv.

[28]  Sam Kwong,et al.  G-MS2F: GoogLeNet based multi-stage feature fusion of deep CNN for scene recognition , 2017, Neurocomputing.

[29]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[30]  Hao Su,et al.  Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification , 2010, NIPS.

[31]  G.L. Foresti,et al.  Active video-based surveillance system: the low-level image and video processing techniques needed for implementation , 2005, IEEE Signal Processing Magazine.

[32]  James M. Rehg,et al.  CENTRIST: A Visual Descriptor for Scene Categorization , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Neville Ryant,et al.  Extending VerbNet with Novel Verb Classes , 2006, LREC.

[35]  C. V. Jawahar,et al.  Blocks That Shout: Distinctive Parts for Scene Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Abel G. Oliva,et al.  Gist of a scene , 2005 .

[37]  Subhransu Maji,et al.  Bilinear Convolutional Neural Networks for Fine-Grained Visual Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.