Image-level classification by hierarchical structure learning with visual and semantic similarities

Abstract Image classification methods often use class-level information without considering the distinctive character of each image. Images of the same class may have varied appearances. Besides, visually similar images may not be semantically correlated. To solve these problems, in this paper, we propose a novel image classification method by automatically learning the image-level hierarchical structure (ILHS) using both visual and semantic similarities. We try to generate new representations by exploring both visual and semantic similarities of images. Images are clustered hierarchically to explore their correlations. We then use them for image representations. The diversity of image classes within each cluster is used to re-weight visual similarities. The re-weighted similarities are aggregated to generate new image representations. We conduct image classification experiments on the Caltech-256 dataset, the PASCAL VOC 2007 dataset and the PASCAL VOC 2012 dataset. Experimental results demonstrate the effectiveness of the proposed method.

[1]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Jun Yu,et al.  Click Prediction for Web Image Reranking Using Multimodal Sparse Coding , 2014, IEEE Transactions on Image Processing.

[3]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[4]  Dieter Fox,et al.  Multipath Sparse Coding Using Hierarchical Matching Pursuit , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Jun Yu,et al.  Multi-view ensemble manifold regularization for 3D object recognition , 2015, Inf. Sci..

[6]  Qi Tian,et al.  Beyond Explicit Codebook Generation: Visual Representation Using Implicitly Transferred Codebooks , 2015, IEEE Transactions on Image Processing.

[7]  Shuicheng Yan,et al.  Multi-loss Regularized Deep Neural Network , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[8]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[9]  Meng Wang,et al.  Multimodal Deep Autoencoder for Human Pose Recovery , 2015, IEEE Transactions on Image Processing.

[10]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[11]  Francesco Bianconi,et al.  An investigation on the use of local multi-resolution patterns for image classification , 2016, Inf. Sci..

[12]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[13]  Xiaoqing Ding,et al.  Detecting Human Action as the Spatio-Temporal Tube of Maximum Mutual Information , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[14]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[15]  Zhaohui Wu,et al.  Weakly Supervised Metric Learning for Traffic Sign Recognition in a LIDAR-Equipped Vehicle , 2016, IEEE Transactions on Intelligent Transportation Systems.

[16]  Qi Tian,et al.  Incremental Codebook Adaptation for Visual Representation and Categorization , 2018, IEEE Transactions on Cybernetics.

[17]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[18]  Thomas Mensink,et al.  Image Classification with the Fisher Vector: Theory and Practice , 2013, International Journal of Computer Vision.

[19]  Jun Yu,et al.  Image-Based 3D Human Pose Recovery with Locality Sensitive Sparse Retrieval , 2013, 2013 IEEE International Conference on Systems, Man, and Cybernetics.

[20]  Qi Tian,et al.  Image Class Prediction by Joint Object, Context, and Background Modeling , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[21]  Cordelia Schmid,et al.  Image categorization using Fisher kernels of non-iid image models , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Qi Tian,et al.  Image classification by non-negative sparse coding, low-rank and sparse decomposition , 2011, CVPR 2011.

[23]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[24]  Qi Tian,et al.  Structured Weak Semantic Space Construction for Visual Categorization , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[25]  Qi Tian,et al.  Image classification by search with explicitly and implicitly semantic representations , 2017, Inf. Sci..

[26]  Qi Tian,et al.  Multiview Label Sharing for Visual Representations and Classifications , 2018, IEEE Transactions on Multimedia.

[27]  Fei Gao,et al.  Deep Multimodal Distance Metric Learning Using Click Constraints for Image Ranking , 2017, IEEE Transactions on Cybernetics.

[28]  Alfred O. Hero,et al.  Efficient learning of sparse, distributed, convolutional feature representations for object recognition , 2011, 2011 International Conference on Computer Vision.

[29]  Qi Tian,et al.  Boosted random contextual semantic space based representation for visual recognition , 2016, Inf. Sci..

[30]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[31]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[32]  Qi Tian,et al.  Image classification using boosted local features with random orientation and location selection , 2015, Inf. Sci..

[33]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[34]  Qingming Huang,et al.  Image classification by non-negative sparse coding, correlation constrained low-rank and sparse decomposition , 2014, Comput. Vis. Image Underst..

[35]  Cordelia Schmid,et al.  Combining efficient object localization and image classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[36]  Qi Tian,et al.  Contextual Exemplar Classifier-Based Image Representation for Classification , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[37]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[38]  Eli Shechtman,et al.  In defense of Nearest-Neighbor based image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Xuelong Li,et al.  Spatial-Aware Object-Level Saliency Prediction by Learning Graphlet Hierarchies , 2015, IEEE Transactions on Industrial Electronics.

[40]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[41]  Patrick Pérez,et al.  Exemplar SVMs as visual feature encoders , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Qi Tian,et al.  Object categorization in sub-semantic space , 2014, Neurocomputing.

[43]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[44]  Yu Zhang,et al.  Exploit Bounding Box Annotations for Multi-Label Object Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Chao Liang,et al.  Fine-Grained Image Classification via Low-Rank Sparse Coding With General and Class-Specific Codebooks. , 2017, IEEE transactions on neural networks and learning systems.

[46]  Qi Tian,et al.  Bundled Local Features for Image Representation , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[47]  Jun Yu,et al.  Semantic preserving distance metric learning and applications , 2014, Inf. Sci..

[48]  Bingbing Ni,et al.  HCP: A Flexible CNN Framework for Multi-Label Image Classification , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Tao Mei,et al.  High-order local ternary patterns with locality preserving projection for smoke detection and image classification , 2016, Inf. Sci..