Automatic image annotation using inverse maps from semantic embeddings

Human annotation in large scale image databases is time-consuming and error-prone. Since it is very hard to mine image databases using just visual features or textual descriptors, it is common to transform the image features into a semantically meaningful space. In this paper, we propose to perform image annotation in a semantic space inferred based on sparse representations. By constructing a semantic embedding for the visual features, that is constrained to be close to the tag embedding, we show that a robust inverse map can be used to predict the tags. Experiments using standard datasets show the effectiveness of the proposed approach in automatic image annotation when compared to existing methods.

[1]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[2]  Bengt Fornberg,et al.  Inverting Non-Linear Dimensionality Reduction with Scale-Free Radial Basis Interpolation , 2013, ArXiv.

[3]  R. Manmatha,et al.  A Model for Learning the Semantics of Pictures , 2003, NIPS.

[4]  W. Boothby An introduction to differentiable manifolds and Riemannian geometry , 1975 .

[5]  Wotao Yin,et al.  A feasible method for optimization with orthogonality constraints , 2013, Math. Program..

[6]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[7]  Gustavo Carneiro,et al.  Supervised Learning of Semantic Classes for Image Annotation and Retrieval , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Edward Y. Chang,et al.  CBSA: content-based soft annotation for multimodal image retrieval using Bayes point machines , 2003, IEEE Trans. Circuits Syst. Video Technol..

[9]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[10]  Vladimir Pavlovic,et al.  A New Baseline for Image Annotation , 2008, ECCV.

[11]  Honggang Zhang,et al.  Reference-Based Scheme Combined With K-SVD for Scene Image Categorization , 2013, IEEE Signal Processing Letters.

[12]  Lei Zhang,et al.  Multi-label sparse coding for automatic image annotation , 2009, CVPR.

[13]  Michael Isard,et al.  A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their Semantics , 2012, International Journal of Computer Vision.

[14]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[15]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[16]  Raimondo Schettini,et al.  Image annotation using SVM , 2003, IS&T/SPIE Electronic Imaging.

[17]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[18]  Yang Yu,et al.  Automatic image annotation using group sparsity , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.