Integrating global and local visual features with semantic hierarchies for two-level image annotation

Image annotation is a challenge task due to the semantic gap between low-level visual features and high-level human concepts. Most previous annotation methods take the task as a multilabel classification problem. However, these methods always suffer from poor accuracy and efficiency in the case that plentiful visual variations and large semantic vocabularies are encountered. In this paper, we focus on two-level image annotation by integrating both the global and local visual features with semantic hierarchies, in an effort to simultaneously learn annotation correspondences in a relatively small and most relevant subspace. Given an image, the two-level task includes scene classification for the image and object labeling for its regions. For scene classification, we first define several specific scenes that describe the most case of the given image data, and then use support vector machines (SVMs) based on the global features. For region labeling, we first format a set of abstract nouns in accordance with WordNet to define relevant objects, and then use local support tensor machines (LSTMs) based on high-order regional features. By introducing a new conditional random field (CRF) model that exploits the multiple correlations with respect to scene-object hierarchies and object-object relationships, our system achieves a more hierarchical and coherent description of image contents than do the simpler image annotation tasks. Experimental results have been reported over the MSRC and SAIAPR datasets to validate the superiority of using multiple visual features and prior semantic correlations for image annotation by comparing with several state-of-the-art methods. HighlightsOur method offers a three-order tensor to capture high-order statistics of regions.Local support tensor machines (LSTMs) is proposed for accurately region labeling.A conditional random field (CRF) model is designed for two-level image annotation.The impacts of using semantic hierarchies are investigated for image annotation.

[1]  Steffen Staab,et al.  Labelling Image Regions Using Wavelet Features and Spatial Prototypes , 2008, SAMT.

[2]  T FreemanWilliam,et al.  80 Million Tiny Images , 2008 .

[3]  Meng Wang,et al.  Collaborative visual modeling for automatic image annotation via sparse model coding , 2012, Neurocomputing.

[4]  Sara Shatford Layne Some issues in the indexing of images , 1994 .

[5]  Vassilios Morellas,et al.  Tensor Sparse Coding for Positive Definite Matrices , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Hugo Jair Escalante,et al.  The segmented and annotated IAPR TC-12 benchmark , 2010, Comput. Vis. Image Underst..

[7]  Yoram Singer,et al.  Large margin hierarchical classification , 2004, ICML.

[8]  Md. Monirul Islam,et al.  A review on automatic image annotation techniques , 2012, Pattern Recognit..

[9]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, CVPR 2004.

[10]  Antonio Criminisi,et al.  TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context , 2007, International Journal of Computer Vision.

[11]  Andrew McCallum,et al.  Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data , 2004, J. Mach. Learn. Res..

[12]  Xiaotong Shen,et al.  On Large Margin Hierarchical Classification With Multiple Paths , 2009, Journal of the American Statistical Association.

[13]  Yongchao Zhao,et al.  A High-Order Statistical Tensor Based Algorithm for Anomaly Detection in Hyperspectral Imagery , 2014, Scientific reports.

[14]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[15]  Keith C. C. Chan,et al.  Tensor-based locally maximum margin classifier for image and video classification , 2011, Comput. Vis. Image Underst..

[16]  Daniel L. Rubin,et al.  A hierarchical knowledge-based approach for retrieving similar medical images described with semantic annotations , 2014, J. Biomed. Informatics.

[17]  Hisham Al-Mubaid,et al.  Measuring Semantic Similarity Between Biomedical Concepts Within Multiple Ontologies , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[18]  Jianping Fan,et al.  Cost-sensitive learning of hierarchical tree classifiers for large-scale image classification and novel category detection , 2015, Pattern Recognit..

[19]  Shih-Fu Chang,et al.  Conceptual framework for indexing visual information at multiple levels , 1999, Electronic Imaging.

[20]  Cordelia Schmid,et al.  TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[21]  Stefan B. Williams,et al.  Hierarchical Bayesian models for unsupervised scene understanding , 2015, Comput. Vis. Image Underst..

[22]  T. Kenny,et al.  CORRIGENDUM: Quantum Limit of Quality Factor in Silicon Micro and Nano Mechanical Resonators , 2014, Scientific Reports.

[23]  Mark A. Musen,et al.  Comparison of Ontology-based Semantic-Similarity Measures , 2008, AMIA.

[24]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[25]  Ning Zhou,et al.  A Hybrid Probabilistic Model for Unified Collaborative and Content-Based Image Tagging , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Greg Mori,et al.  Guiding model search using segmentation , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[27]  Roy Rada,et al.  Development and application of a metric on semantic nets , 1989, IEEE Trans. Syst. Man Cybern..

[28]  Marcel Worring,et al.  Classification of user image descriptions , 2004, Int. J. Hum. Comput. Stud..

[29]  Hugo Jair Escalante,et al.  An energy-based model for region-labeling , 2011, Comput. Vis. Image Underst..

[30]  Dinh Q. Phung,et al.  Bayesian Nonparametric Multilevel Clustering with Group-Level Contexts , 2014, ICML.

[31]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[32]  Dan I. Moldovan,et al.  Exploiting ontologies for automatic image annotation , 2005, SIGIR '05.

[33]  Jianping Fan,et al.  Integrating Concept Ontology and Multitask Learning to Achieve More Effective Classifier Training for Multilevel Image Annotation , 2008, IEEE Transactions on Image Processing.

[34]  S. Sathiya Keerthi,et al.  Improvements to Platt's SMO Algorithm for SVM Classifier Design , 2001, Neural Computation.

[35]  Fatih Murat Porikli,et al.  Region Covariance: A Fast Descriptor for Detection and Classification , 2006, ECCV.

[36]  Andrea Tagarelli,et al.  Exploring dictionary-based semantic relatedness in labeled tree data , 2013, Inf. Sci..