Leveraging auxiliary text terms for automatic image annotation

This paper proposes a novel algorithm to annotate web images by automatically aligning the images with their most relevant auxiliary text terms. First, the DOM-based web page segmentation is performed to extract images and their most relevant auxiliary text blocks. Second, automatic image clustering is used to partition the web images into a set of groups according to their visual similarity contexts, which significantly reduces the uncertainty on the relatedness between the images and their auxiliary terms. The semantics of the visually-similar images in the same cluster are then described by the same ranked list of terms which frequently co-occur in their text blocks. Finally, a relevance re-ranking process is performed over a term correlation network to further refine the ranked term list. Our experiments on a large-scale database of web pages have provided very positive results.

[1]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[2]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[3]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.