Clustering Web Search Results Based on Interactive Suffix Tree Algorithm

Clustering is an effective way to organize Web search results, which allows users to navigate into relevant documents quickly. Traditional clustering techniques are inadequate to Chinese search results and do not generated clusters with highly readable names. In this paper, we propose a new method to clustering Web search results which is based on interactive suffix tree algorithm (ISTC). This method uses phrase extracted from the snippets as characteristics of clustering. In the course of interaction with users, it only returns cluster label to users in the first tier. When users want to make further interaction, users can select a document which they are interested in for the second clustering instead of the traditional recursive clustering. ISTC can also be applied to Chinese and English information processing which avoids the recursive algorithm for achieving linear time complexity and improving the efficiency of search engine. Experimental results verify our methodpsilas feasibility and effectiveness.

[1]  Dawid Weiss,et al.  A concept-driven algorithm for clustering search results , 2005, IEEE Intelligent Systems.

[2]  Lei Zhang,et al.  IGroup: presenting web image search results in semantic clusters , 2007, CHI.

[3]  W. Bruce Croft,et al.  An Evaluation of Techniques for Clustering Search Results , 2005 .

[4]  Wei-Ying Ma,et al.  IGroup: a web image search engine with semantic clustering of search results , 2006, MM '06.

[5]  Raghu Krishnapuram,et al.  Automatic Taxonomy Generation: Issues and Possibilities , 2003, IFSA.

[6]  Wei-Ying Ma,et al.  IGroup: web image search results clustering , 2006, MM '06.

[7]  Oren Etzioni,et al.  Web document clustering: a feasibility demonstration , 1998, SIGIR '98.

[8]  James Allan,et al.  Improving Interactive Retrieval by Combining Ranked List and Clustering , 2000, RIAO.

[9]  Chao Ke Hierarchical clustering of Chinese web pages based on suffix tree , 2006 .

[10]  ChengXiang Zhai,et al.  Learn from web search logs to organize search results , 2007, SIGIR.

[11]  Paolo Ferragina,et al.  A personalized search engine based on Web‐snippet hierarchical clustering , 2008, Softw. Pract. Exp..

[12]  W. Bruce Croft,et al.  Deriving concept hierarchies from text , 1999, SIGIR '99.

[13]  Wei-Ying Ma,et al.  Learning to cluster web search results , 2004, SIGIR '04.

[14]  Marti A. Hearst,et al.  Reexamining the cluster hypothesis: scatter/gather on retrieval results , 1996, SIGIR '96.

[15]  Paolo Ferragina,et al.  A personalized search engine based on Web-snippet hierarchical clustering , 2008 .

[16]  Oren Etzioni,et al.  Grouper: A Dynamic Clustering Interface to Web Search Results , 1999, Comput. Networks.

[17]  W. Bruce Croft,et al.  Finding Topic Words for Hierarchical Summarization ( DRAFT ) , 2001 .