Automatic image annotation by combining generative and discriminant models

Generative model based image annotation methods have achieved good annotation performance. However, due to the problem of semantic gap, these methods always suffer from the images with similar visual features but different semantics. It seems promising to separate these images from semantic relevant ones by using discriminant models, since they have shown excellent generalization performance. Motivated to gain the benefits of both generative and discriminative approaches, in this paper, we propose a novel image annotation approach which combine the generative and discriminative models through local discriminant topics in the neighborhood of the unlabeled image. Singular Value Decomposition(SVD) is applied to group the images of the neighborhood into different topics according to their semantic labels. The semantic relevant images and the irrelevant ones are always assigned into different topics. By exploiting the discriminant information between different topics, Support Vector Machine(SVM) is applied to classify the unlabeled image into the relevant topic, from which the more accurate annotation will be obtained by reducing the bad influence of irrelevant images. The experiments on the ECCV 2002 [3] and NUS-WIDE [34] benchmark show that our method outperforms state-of-the-art annotation models.

[1]  David G. Stork,et al.  Pattern Classification , 1973 .

[2]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[3]  Daniel Gatica-Perez,et al.  PLSA-based image auto-annotation: constraining the latent space , 2004, MULTIMEDIA '04.

[4]  Xiangdong Zhou,et al.  Effective automatic image annotation via integrated discriminative and generative models , 2014, Inf. Sci..

[5]  Robert Tibshirani,et al.  Discriminant Adaptive Nearest Neighbor Classification , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Tat-Seng Chua,et al.  Image Annotation by Graph-Based Inference With Integrated Multiple/Single Instance Representations , 2010, IEEE Transactions on Multimedia.

[7]  Jonathon S. Hare,et al.  Mind the gap: another look at the problem of the semantic gap in image retrieval , 2006, Electronic Imaging.

[8]  Jing Hua,et al.  Region-based Image Annotation using Asymmetrical Support Vector Machine-based Multiple-Instance Learning , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[9]  David A. Forsyth,et al.  Clustering art , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[10]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[11]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[12]  Daniel Gatica-Perez,et al.  On image auto-annotation with latent space models , 2003, ACM Multimedia.

[13]  Fei Wang,et al.  Label Propagation through Linear Neighborhoods , 2008, IEEE Trans. Knowl. Data Eng..

[14]  Jitendra Malik,et al.  SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[15]  R. Manmatha,et al.  A Model for Learning the Semantics of Pictures , 2003, NIPS.

[16]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Meng Wang,et al.  Learning Visual Semantic Relationships for Efficient Visual Retrieval , 2015, IEEE Transactions on Big Data.

[18]  R. Manmatha,et al.  An Inference Network Approach to Image Retrieval , 2004, CIVR.

[19]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[20]  Wei-Ying Ma,et al.  AnnoSearch: Image Auto-Annotation by Search , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[21]  Chin-Hui Lee,et al.  Bayesian Learning of Hierarchical Multinomial Mixture Models of Concepts for Automatic Image Annotation , 2006, CIVR.

[22]  Xuelong Li,et al.  Image Annotation by Multiple-Instance Learning With Discriminative Feature Mapping and Selection , 2014, IEEE Transactions on Cybernetics.

[23]  Yi Li,et al.  ARISTA - image search to annotation on billions of web photos , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[24]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[25]  Gustavo Carneiro,et al.  Formulating semantic image annotation as a supervised learning problem , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[26]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[27]  Jianping Fan,et al.  Incorporating concept ontology to enable probabilistic concept reasoning for multi-level image annotation , 2006, MIR '06.

[28]  Antonio Torralba,et al.  Object and scene recognition in tiny images , 2010 .

[29]  Y. Mori,et al.  Image-to-word transformation based on dividing and vector quantizing images with words , 1999 .

[30]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[31]  Jianping Fan,et al.  Automatic image annotation by incorporating feature hierarchy and boosting to scale up SVM classifiers , 2006, MM '06.

[32]  James Ze Wang,et al.  Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[33]  Hung-Khoon Tan,et al.  Beyond search: Event-driven summarization for web videos , 2011, TOMCCAP.