ANNOR: Efficient image annotation based on combining local and global features

Automatic image annotation methods based on searching for correlations require a quality training image dataset. For a target image, its annotation is predicted based on a mutual similarity of the target image to the training images. One of the main problems of current methods is their low effectiveness and scalability if a relatively large-scale training dataset is used. In this paper we describe our approach “Automatic image aNNOtation Retriever” (ANNOR) for acquiring annotations for target images, which is based on a combination of local and global features. ANNOR is resistant to common transforms (cropping, scaling), which traditional approaches based on global features cannot cope with. We are able to ensure the robustness and generalization needed by complex queries and significantly eliminate irrelevant results. We identify objects directly in the target images and for each obtained annotation we estimate the probability of its relevance. We focus on the way how people manually annotate images (human aspects of image perception). We have designed ANNOR to use large-scale image training datasets. We present experimental results for three challenging (baseline) datasets. ANNOR makes an improvement as compared to the current state-of-the-art.

[1]  Steve Branson,et al.  Similarity metrics for categorization: From monolithic to category specific , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[2]  Marc Alexa,et al.  An evaluation of descriptors for large-scale image retrieval from sketched feature lines , 2010, Comput. Graph..

[3]  Yi Zhen,et al.  A probabilistic model for multimodal hash function learning , 2012, KDD.

[4]  Jiayu Tang,et al.  Using multiple segmentations for image auto-annotation , 2007, CIVR '07.

[5]  Cordelia Schmid,et al.  TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[6]  R. Manmatha,et al.  A Model for Learning the Semantics of Pictures , 2003, NIPS.

[7]  Wei-Ying Ma,et al.  AnnoSearch: Image Auto-Annotation by Search , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[8]  Gustavo Carneiro,et al.  Supervised Learning of Semantic Classes for Image Annotation and Retrieval , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Jiebo Luo,et al.  A computationally efficient approach to indoor/outdoor scene classification , 2002, Object recognition supported by user interaction for service robots.

[10]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[11]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[12]  Bin Wang,et al.  Dual cross-media relevance model for image annotation , 2007, ACM Multimedia.

[13]  Martin Szummer,et al.  Indoor-outdoor image classification , 1998, Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database.

[14]  Farshad Fotouhi,et al.  Region based image annotation through multiple-instance learning , 2005, MULTIMEDIA '05.

[15]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[16]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[17]  Anil K. Jain,et al.  Image classification for content-based indexing , 2001, IEEE Trans. Image Process..

[18]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Hong Tang,et al.  Experimental analysis on classification of unmanned aerial vehicle images using the probabilistic latent semantic analysis , 2009, Other Conferences.

[20]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[21]  Changhu Wang,et al.  Scalable search-based image annotation of personal images , 2006, MIR '06.

[22]  James Ze Wang,et al.  Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Stefan M. Rüger,et al.  Automated Image Annotation Using Global Features and Robust Nonparametric Density Estimation , 2005, CIVR.

[25]  Vladimir Pavlovic,et al.  A New Baseline for Image Annotation , 2008, ECCV.

[26]  Yiannis S. Boutalis,et al.  CEDD: Color and Edge Directivity Descriptor: A Compact Descriptor for Image Indexing and Retrieval , 2008, ICVS.

[27]  Jun Jie Foo,et al.  Pruning SIFT for Scalable Near-duplicate Image Matching , 2007, ADC.

[28]  Yiannis S. Boutalis,et al.  Accurate Image Retrieval Based on Compact Composite Descriptors and Relevance Feedback Information , 2010, Int. J. Pattern Recognit. Artif. Intell..

[29]  R. Manmatha,et al.  An Inference Network Approach to Image Retrieval , 2004, CIVR.

[30]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[31]  Thomas Hofmann,et al.  Support Vector Machines for Multiple-Instance Learning , 2002, NIPS.

[32]  Joaquim A. Jorge,et al.  Towards content-based retrieval of technical drawings through high-dimensional indexing , 2003, Comput. Graph..

[33]  Y. Mori,et al.  Image-to-word transformation based on dividing and vector quantizing images with words , 1999 .

[34]  Lei Zhang,et al.  Image annotation by incorporating word correlations into multi-class SVM , 2011, Soft Comput..

[35]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[36]  Raimondo Schettini,et al.  Image annotation using SVM , 2003, IS&T/SPIE Electronic Imaging.

[37]  Marc Alexa,et al.  How do humans sketch objects? , 2012, ACM Trans. Graph..

[38]  R. Manmatha,et al.  Image retrieval using Markov Random Fields and global image features , 2010, CIVR '10.

[39]  Edward Y. Chang,et al.  CBSA: content-based soft annotation for multimodal image retrieval using Bayes point machines , 2003, IEEE Trans. Circuits Syst. Video Technol..

[40]  Daniel P. Huttenlocher,et al.  Efficient Graph-Based Image Segmentation , 2004, International Journal of Computer Vision.

[41]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[42]  James Hays,et al.  SUN attribute database: Discovering, annotating, and recognizing scene attributes , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Wei-Ying Ma,et al.  Image and Video Retrieval , 2003, Lecture Notes in Computer Science.