PicASHOW: pictorial authority search by hyperlinks on the Web

We describe PicASHOW, a fully automated WWW image retrieval system that is based on several link-structure analyzing algorithms. Our basic premise is that a page p displays (or links to) an image when the author of p considers the image to be of value to the viewers of the page. We thus extend some well known link-based WWW page retrieval schemes to the context of image retrieval.PicASHOW's analysis of the link structure enables it to retrieve relevant images even when those are stored in files with meaningless names. The same analysis also allows it to identify image containers and image hubs. We define these as Web pages that are rich in relevant images, or from which many images are readily accessible.PicASHOW requires no image analysis whatsoever and no creation of taxonomies for preclassification of the Web's images. It can be implemented by standard WWW search engines with reasonable overhead, in terms of both computations and storage, and with no change to user query formats. It can thus be used to easily add image retrieving capabilities to standard search engines.Our results demonstrate that PicASHOW, while relying almost exclusively on link analysis, compares well with dedicated WWW image retrieval systems. We conclude that link analysis, a proven effective technique for Web page search, can improve the performance of Web image retrieval, as well as extend its definition to include the retrieval of image hubs and containers.

[1]  M. M. Kessler Bibliographic coupling between scientific papers , 1963 .

[2]  Henry G. Small,et al.  Co-citation in the scientific literature: A new measure of the relationship between two documents , 1973, J. Am. Soc. Inf. Sci..

[3]  Michael J. Swain,et al.  WebSeer: An Image Search Engine for the World Wide Web , 1996 .

[4]  Mark D. Dunlop,et al.  Image retrieval by hypertext links , 1997, SIGIR '97.

[5]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[6]  S. Sclaroff,et al.  Combining textual and visual cues for content-based image retrieval on the World Wide Web , 1998, Proceedings. IEEE Workshop on Content-Based Access of Image and Video Libraries (Cat. No.98EX173).

[7]  Krishna Bharat,et al.  Improved algorithms for topic distillation in a hyperlinked environment , 1998, SIGIR '98.

[8]  Jon M. Kleinberg,et al.  Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text , 1998, Comput. Networks.

[9]  M. KleinbergJon Authoritative sources in a hyperlinked environment , 1999 .

[10]  John R. Smith,et al.  Searching for Images and Videos on the World-Wide Web , 1999 .

[11]  Monika Henzinger,et al.  Finding Related Pages in the World Wide Web , 1999, Comput. Networks.

[12]  Arnold W. M. Smeulders,et al.  The PicToSeek WWW image search system , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[13]  Ambuj K. Singh,et al.  Dimensionality Reduction for Similarity Searching in Dynamic Databases , 1999, Comput. Vis. Image Underst..

[14]  Alberto O. Mendelzon,et al.  What is this page known for? Computing Web page reputations , 2000, Comput. Networks.

[15]  Shlomo Moran,et al.  The stochastic approach for link-structure analysis (SALSA) and the TKC effect , 2000, Comput. Networks.

[16]  David Carmel,et al.  Knowledge Agents on the Web , 2000, CIA.

[17]  Aya Soffer,et al.  PicASHOW: pictorial authority search by hyperlinks on the web , 2002, ACM Trans. Inf. Syst..