Exploiting surrounding text for retrieving web images

Web documents contain useful textual information that can be exploited for describing images. Research had been focused on representing images by means of its content (low level) description such as color, shape and texture, little research had been directed to exploiting such textual information. The aim of this research was to systematically exploit the textual content of HTML documents for automatically indexing and ranking of images embedded in web documents. A heuristic approach for locating and assigning weight surrounding web images and a modified tf.idf weighting scheme was proposed. Precision-recall measures of evaluation had been conducted for ten queries and promising results had been achieved. The proposed approach showed slightly better precision measure as compared to a popular search engine with an average of 0.63 and 0.55 relative precision measures respectively.

[1]  Vittorio Castelli,et al.  Image Databases: Search and Retrieval of Digital Imagery , 2002 .

[2]  Zhiguo Gong,et al.  Web image indexing by using associated texts , 2005, Knowledge and Information Systems.

[3]  Ben Bradshaw,et al.  Semantic based image retrieval: a probabilistic approach , 2000, ACM Multimedia.

[4]  Lailatul Qadri Zakaria,et al.  A semantic retrieval of web documents using domain ontology , 2005, Int. J. Web Grid Serv..

[5]  S. Sclaroff,et al.  Combining textual and visual cues for content-based image retrieval on the World Wide Web , 1998, Proceedings. IEEE Workshop on Content-Based Access of Image and Video Libraries (Cat. No.98EX173).

[6]  Alex Pentland,et al.  Photobook: Content-based manipulation of image databases , 1996, International Journal of Computer Vision.

[7]  Marco La Cascia,et al.  Unifying Textual and Visual Cues for Content-Based Image Retrieval on the World Wide Web , 1999, Comput. Vis. Image Underst..

[8]  Jitendra Malik,et al.  Blobworld: Image Segmentation Using Expectation-Maximization and Its Application to Image Querying , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Nicu Sebe,et al.  Content-based multimedia information retrieval: State of the art and challenges , 2006, TOMCCAP.

[10]  Chen Zhang,et al.  User term feedback in interactive text-based image retrieval , 2005, SIGIR '05.

[11]  S. M. Shafi,et al.  Precision and Recall of Five Search Engines for Retrieval of Scholarly Information in the Field of Biotechnology , 2005, Webology.

[12]  Neil C. Rowe Finding and Labeling the Subject of a Captioned Depictive Natural Photograph , 2002, IEEE Trans. Knowl. Data Eng..

[13]  Shih-Fu Chang,et al.  Concepts and Techniques for Indexing Visual Semantics , 2002 .