Empirical Comparison of Automatic Image Annotation Systems

The performance of content-based image retrieval systems has proved to be inherently constrained by the used low-level features, and cannot give satisfactory results when the user's high level concepts cannot be expressed by low level features. In an attempt to bridge this semantic gap, recent approaches started integrating both low level-visual features and high-level textual keywords. Unfortunately, manual image annotation is a tedious process and may not be possible for large image databases. To overcome this limitation, several approaches that can annotate images in a semi-supervised or unsupervised way have emerged. In this paper, we outline and compare four different algorithms. The first one is simple and assumes that image annotation can be viewed as the task of translating from a vocabulary of fixed image regions to a vocabulary of words. The second approach uses a set of annotated images as a training set and learns the joint distribution of regions and words. The third and fourth approaches are based on segmenting the images into homogeneous regions. Both of these approaches rely on a clustering algorithm to learn the association between visual features and keywords. The clustering task is not trivial as it involves clustering a very high-dimensional and sparse feature spaces. To address this, the third approach uses semi-supervised constrained clustering while the fourth approach relies on an algorithm that performs simultaneous clustering and feature discrimination. These four algorithms were implemented and tested on a data set that includes 6000 images using four-fold cross validation.

[1]  Igor Kononenko,et al.  Semi-Naive Bayesian Classifier , 1991, EWSL.

[2]  Jitendra Malik,et al.  Blobworld: A System for Region-Based Image Indexing and Retrieval , 1999, VISUAL.

[3]  David A. Forsyth,et al.  Learning the semantics of words and pictures , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[4]  R. Manmatha,et al.  A Model for Learning the Semantics of Pictures , 2003, NIPS.

[5]  Paul H. Lewis,et al.  Automatic Annotation of Images from the Practitioner Perspective , 2005, CIVR.

[6]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Martin Szummer,et al.  Indoor-outdoor image classification , 1998, Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database.

[8]  Hichem Frigui,et al.  Region Based Image Annotation , 2006, 2006 International Conference on Image Processing.

[9]  R. Manmatha,et al.  Automatic Image Annotation and Retrieval using CrossMedia Relevance Models , 2003 .

[10]  Marco La Cascia,et al.  Unifying Textual and Visual Cues for Content-Based Image Retrieval on the World Wide Web , 1999, Comput. Vis. Image Underst..

[11]  Tat-Seng Chua,et al.  Building Semantic Perceptron Net for Topic Spotting , 2001, ACL.

[12]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  Anil K. Jain,et al.  On image classification: city images vs. landscapes , 1998, Pattern Recognit..

[14]  W. Bruce Croft,et al.  Cross-lingual relevance models , 2002, SIGIR '02.

[15]  Anil K. Jain,et al.  On image classification: city vs. landscape , 1998, Proceedings. IEEE Workshop on Content-Based Access of Image and Video Libraries (Cat. No.98EX173).

[16]  Tat-Seng Chua,et al.  A Novel Approach to Auto Image Annotation Based on Pairwise Constrained Clustering and Semi-Naïve Bayesian Model , 2005, 11th International Multimedia Modelling Conference.

[17]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[18]  Hichem Frigui,et al.  Unsupervised learning of prototypes and attribute weights , 2004, Pattern Recognit..

[19]  Y. Mori,et al.  Image-to-word transformation based on dividing and vector quantizing images with words , 1999 .

[20]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[21]  Thomas S. Huang,et al.  Unifying Keywords and Visual Contents in Image Retrieval , 2002, IEEE Multim..

[22]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[23]  ZhangJian,et al.  A Discriminative Learning Framework with Pairwise Constraints for Video Object Classification , 2006 .

[24]  Susanne Ornager Image Retrieval: Theoretical Analysis and Empirical User Studies on Accessing Information in Images. , 1997 .

[25]  Hichem Frigui,et al.  Clustering by competitive agglomeration , 1997, Pattern Recognit..

[26]  William I. Grosky,et al.  From features to semantics: some preliminary results , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[27]  William I. Grosky,et al.  Negotiating the semantic gap: from feature maps to semantic landscapes , 2001, Pattern Recognit..

[28]  Doug Fisher,et al.  Learning from Data: Artificial Intelligence and Statistics V , 1996 .

[29]  Hichem Frigui,et al.  Unsupervised Image Segmentation and Annotation for Content-Based Image Retrieval , 2006, 2006 IEEE International Conference on Fuzzy Systems.

[30]  Michael J. Pazzani,et al.  Searching for Dependencies in Bayesian Classifiers , 1995, AISTATS.

[31]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[32]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[33]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[34]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..