Image classification by search with explicitly and implicitly semantic representations

We model the explicitly and implicitly semantic representations for image classification.Explicitly semantic representations are obtained using learned models with supervision.Implicitly semantic representations are obtained with information from the Internet.The explicitly and implicitly semantic based method achieves good performances. Image classification refers to the task of automatically classifying the categories of images based on the contents. This task is typically solved using visual features with the histogram based classification scheme. Although effective, this strategy has two drawbacks. On one hand, histogram based representation often disregards the object layout which is very important for classification. On the other hand, visual features are unable to fully separate different images due to the semantic gap. To solve these two problems, in this paper, we propose a novel image classification method by explicitly and implicitly representing the images with searching strategy. First, to make use of object layouts, we randomly select a number of regions and then use these regions for image representations. Second, we generate the explicitly semantic representations using a number of pre-learned semantic models. Third, we measure the visual similarities with the Internet images and use the text information for implicitly semantic representations. Since Internet images are contaminated with noise, the resulting representations only implicitly reflect the contents of images. Finally, both the explicitly and implicitly semantic representations are jointly modeled for image classifications by training bi-linear classifiers. We evaluate the effectiveness of the proposed image classification by search with explicitly and implicitly semantic representations method (EISR) on the Scene-15 dataset, the MIT-Indoor dataset, the UIUC-Sports dataset and the PASCAL VOC 2007 dataset. The experimental results prove the usefulness of the proposed method.

[1]  Alex Pentland,et al.  Task-Specific Gesture Analysis in Real-Time Using Interpolated Views , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[3]  Liang-Tien Chia,et al.  Laplacian Sparse Coding, Hypergraph Laplacian Sparse Coding, and Applications , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Alexei A. Efros,et al.  Mid-level Visual Element Discovery as Discriminative Mode Seeking , 2013, NIPS.

[5]  Qi Tian,et al.  Boosted random contextual semantic space based representation for visual recognition , 2016, Inf. Sci..

[6]  Qi Tian,et al.  Image classification using boosted local features with random orientation and location selection , 2015, Inf. Sci..

[7]  Qi Tian,et al.  Image classification using Harr-like transformation of local features with coding residuals , 2013, Signal Process..

[8]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[9]  Wei-Ying Ma,et al.  Duplicate-Search-Based Image Annotation Using Web-Scale Data , 2012, Proceedings of the IEEE.

[10]  Thomas Mensink,et al.  Image Classification with the Fisher Vector: Theory and Practice , 2013, International Journal of Computer Vision.

[11]  Yuan Yan Tang,et al.  High-Order Distance-Based Multiview Stochastic Learning in Image Classification , 2014, IEEE Transactions on Cybernetics.

[12]  Ying He,et al.  Mining Weakly Labeled Web Facial Images for Search-Based Face Annotation , 2014, IEEE Trans. Knowl. Data Eng..

[13]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Changsheng Xu,et al.  Low-Rank Sparse Coding for Image Classification , 2013, 2013 IEEE International Conference on Computer Vision.

[15]  Ying He,et al.  Mining Weakly Labeled Web Facial Images for Search-Based Face Annotation , 2011, IEEE Transactions on Knowledge and Data Engineering.

[16]  Qi Tian,et al.  A Boosting, Sparsity- Constrained Bilinear Model for Object Recognition , 2012, IEEE MultiMedia.

[17]  Qi Tian,et al.  Beyond visual features: A weak semantic image representation using exemplar classifiers for classification , 2013, Neurocomputing.

[18]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[19]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[20]  Chong-Wah Ngo,et al.  On the Annotation of Web Videos by Efficient Near-Duplicate Search , 2010, IEEE Transactions on Multimedia.

[21]  Qi Tian,et al.  Object categorization in sub-semantic space , 2014, Neurocomputing.

[22]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[23]  Mubarak Shah,et al.  View-Invariant Representation and Recognition of Actions , 2002, International Journal of Computer Vision.

[24]  Svetlana Lazebnik,et al.  Multi-scale Orderless Pooling of Deep Convolutional Activation Features , 2014, ECCV.

[25]  Jun Yu,et al.  Multi-view ensemble manifold regularization for 3D object recognition , 2015, Inf. Sci..

[26]  Fei-Fei Li,et al.  Object-Centric Spatial Pooling for Image Classification , 2012, ECCV.

[27]  Nuno Vasconcelos,et al.  Scene classification with semantic Fisher vectors , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Nenghai Yu,et al.  Semantics-Preserving Bag-of-Words Models and Applications , 2010, IEEE Transactions on Image Processing.

[29]  James M. Rehg,et al.  CENTRIST: A Visual Descriptor for Scene Categorization , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Cewu Lu,et al.  Learning Important Spatial Pooling Regions for Scene Classification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Cor J. Veenman,et al.  Visual Word Ambiguity , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[33]  Jun Yu,et al.  Semantic preserving distance metric learning and applications , 2014, Inf. Sci..

[34]  Chun Chen,et al.  Graph Regularized Sparse Coding for Image Representation , 2011, IEEE Transactions on Image Processing.

[35]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[36]  Antonio Torralba,et al.  Recognizing indoor scenes , 2009, CVPR.

[37]  Andrew W. Fitzgibbon,et al.  Efficient Object Category Recognition Using Classemes , 2010, ECCV.

[38]  Nuno Vasconcelos,et al.  Holistic Context Models for Visual Recognition , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Hao Su,et al.  Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification , 2010, NIPS.

[40]  Yi Ma,et al.  Learning Category-Specific Dictionary and Shared Dictionary for Fine-Grained Image Categorization , 2014, IEEE Transactions on Image Processing.

[41]  Nuno Vasconcelos,et al.  Scene classification with low-dimensional semantic spaces and weak supervision , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Yanhui Xiao,et al.  Kernel Reconstruction ICA for Sparse Representation , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[43]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[44]  Wei-Ying Ma,et al.  AnnoSearch: Image Auto-Annotation by Search , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[45]  Fereshteh Sadeghi,et al.  Latent Pyramidal Regions for Recognizing Scenes , 2012, ECCV.

[46]  Andrew Zisserman,et al.  Scene Classification Using a Hybrid Generative/Discriminative Approach , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[48]  Qingming Huang,et al.  Image classification by non-negative sparse coding, correlation constrained low-rank and sparse decomposition , 2014, Comput. Vis. Image Underst..

[49]  Thomas S. Huang,et al.  Image Classification Using Super-Vector Coding of Local Image Descriptors , 2010, ECCV.

[50]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.