Region Based Image Annotation

We propose an unsupervised approach to segment color images and annotate its regions. The annotation process uses a multi-modal thesaurus that is built from a large collection of training images by learning associations between low-level visual features and keywords. We assume that a collection of images is available and that each image is globally annotated. The objective is to extract representative visual profiles that correspond to frequent homogeneous regions, and to associate them with keywords. These labeled profiles would be used to build a multi-modal thesaurus that could serve as a foundation for region based annotation. Our approach has two main steps. First, each image is coarsely segmented into regions, and visual features are extracted from each region. Second, the regions are categorized using a novel algorithm that performs clustering and feature weighting simultaneously. As a result, we obtain clusters of regions that share subsets of relevant features. Representatives from each cluster and their relevant visual and textual features would be used to build a thesaurus. The proposed approach is trained with a collection of 2695 images and tested with several different images.

[1]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Thomas S. Huang,et al.  Unifying Keywords and Visual Contents in Image Retrieval , 2002, IEEE Multim..

[3]  David W. Aha,et al.  A Review and Empirical Evaluation of Feature Weighting Methods for a Class of Lazy Learning Algorithms , 1997, Artificial Intelligence Review.

[4]  Hichem Frigui,et al.  Clustering by competitive agglomeration , 1997, Pattern Recognit..

[5]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[6]  Hichem Frigui,et al.  Unsupervised learning of prototypes and attribute weights , 2004, Pattern Recognit..

[7]  Y. Mori,et al.  Image-to-word transformation based on dividing and vector quantizing images with words , 1999 .

[8]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[9]  B. S. Manjunath,et al.  Introduction to mpeg-7 , 2002 .

[10]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[11]  Alberto Del Bimbo,et al.  Visual information retrieval , 1999 .

[12]  Marco La Cascia,et al.  Unifying Textual and Visual Cues for Content-Based Image Retrieval on the World Wide Web , 1999, Comput. Vis. Image Underst..

[13]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.