Hierarchical clustering of WWW image search results using visual, textual and link information

We consider the problem of clustering Web image search results. Generally, the image search results returned by an image search engine contain multiple topics. Organizing the results into different semantic clusters facilitates users' browsing. In this paper, we propose a hierarchical clustering method using visual, textual and link analysis. By using a vision-based page segmentation algorithm, a web page is partitioned into blocks, and the textual and link information of an image can be accurately extracted from the block containing that image. By using block-level link analysis techniques, an image graph can be constructed. We then apply spectral techniques to find a Euclidean embedding of the images which respects the graph structure. Thus for each image, we have three kinds of representations, i.e. visual feature based representation, textual feature based representation and graph based representation. Using spectral clustering techniques, we can cluster the search results into different semantic clusters. An image search example illustrates the potential of these techniques.

[1]  Aya Soffer,et al.  PicASHOW: pictorial authority search by hyperlinks on the Web , 2001, WWW '01.

[2]  Soumen Chakrabarti,et al.  Integrating the document object model with hyperlinks for enhanced topic distillation and information extraction , 2001, WWW '01.

[3]  Wei-Ying Ma,et al.  Organizing WWW images based on the analysis of page layout and Web link structure , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[4]  W. Bruce Croft,et al.  An Evaluation of Techniques for Clustering Search Results , 2005 .

[5]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[6]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[7]  Kerry Rodden,et al.  Does organisation by similarity assist image browsing? , 2001, CHI.

[8]  Michael J. Swain,et al.  WebSeer: An Image Search Engine for the World Wide Web , 1996 .

[9]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[10]  Oren Etzioni,et al.  Grouper: A Dynamic Clustering Interface to Web Search Results , 1999, Comput. Networks.

[11]  Shiri Gordon,et al.  Applying the information bottleneck principle to unsupervised clustering of discrete and continuous image representations , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[12]  B. S. Manjunath,et al.  Texture Features for Browsing and Retrieval of Image Data , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Mingjing Li,et al.  Color texture moments for content-based image retrieval , 2002, Proceedings. International Conference on Image Processing.

[14]  Yixin Chen,et al.  Content-based image retrieval by clustering , 2003, MIR '03.

[15]  Markus A. Stricker,et al.  Similarity of color images , 1995, Electronic Imaging.

[16]  Wei-Ying Ma,et al.  Improving pseudo-relevance feedback in web information retrieval using web page segmentation , 2003, WWW '03.

[17]  Wei-Ying Ma,et al.  VIPS: a Vision-based Page Segmentation Algorithm , 2003 .

[18]  Wei-Ying Ma,et al.  Imagerank : spectral techniques for structural analysis of image database , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[19]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20]  Wei-Ying Ma,et al.  Block-based web search , 2004, SIGIR '04.

[21]  Wei-Ying Ma,et al.  Block-level link analysis , 2004, SIGIR '04.