Landscape of Web Search Results Clustering Algorithms

Searching for information on the Webhas attracted great attention in many research com-communities. Due to the enormous size of the Web and low precision of user queries, results returned from present web search engines can reach hundreds or even hundreds of thousands documents. Therefore, finding the right information can be difficult if not impossible. One approach that tries to solve this problem is by using clustering techniques for grouping similar document together in order to facilitate presentation of results in more compact form and enable thematic browsing of the results set. Web Search Results clustering is about efficient identification of meaningful, thematic groups of documents in a search result and their concise presentation. This paper is an introduction to the problem of web search results clustering and we have a brief survey of previous work on web search results clustering and existing commercial search engines using this technique, and propose the possibility of future research direction.

[1]  Oren Etzioni,et al.  Clustering web documents: a phrase-based method for grouping search engine results , 1999 .

[2]  Xiaoying Gao,et al.  Improving Web clustering by cluster selection , 2005, The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05).

[3]  Worapoj Kreesuradej,et al.  A New Web Search Result Clustering based on True Common Phrase Label Discovery , 2006, 2006 International Conference on Computational Inteligence for Modelling Control and Automation and International Conference on Intelligent Agents Web Technologies and International Commerce (CIMCA'06).

[4]  Giansalvatore Mecca,et al.  A new algorithm for clustering search results , 2007, Data Knowl. Eng..

[5]  Ricardo Campos,et al.  WISE: Hierarchical Soft Clustering of Web Page Search Results Based on Web Content Mining Techniques , 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06).

[6]  Paolo Ferragina,et al.  A personalized search engine based on Web‐snippet hierarchical clustering , 2005, WWW '05.

[7]  Anupam Joshi,et al.  Retriever: Improving Web Search Engine Results Using Clustering , 2000 .

[8]  Xiaoying Gao,et al.  Query Directed Web Page Clustering , 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06).

[9]  Xiaotie Deng,et al.  Efficient Phrase-Based Document Similarity for Clustering , 2008, IEEE Transactions on Knowledge and Data Engineering.

[10]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[11]  Wei-Ying Ma,et al.  Learning to cluster web search results , 2004, SIGIR '04.

[12]  Marti A. Hearst,et al.  Reexamining the cluster hypothesis: scatter/gather on retrieval results , 1996, SIGIR '96.

[13]  Paolo Ferragina,et al.  The anatomy of a hierarchical clustering engine for Web-page, news and book snippets , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[14]  Dawid Weiss,et al.  Conceptual Clustering Using Lingo Algorithm: Evaluation on Open Directory Project Data , 2004, Intelligent Information Systems.

[15]  Shi Zhong,et al.  Semi-supervised model-based document clustering: A comparative study , 2006, Machine Learning.

[16]  Oren Etzioni,et al.  Web document clustering: a feasibility demonstration , 1998, SIGIR '98.

[17]  Ujwala Bharambe,et al.  A New Suffix Tree Similarity Measure and Labeling for Web Search Results Clustering , 2009, 2009 Second International Conference on Emerging Trends in Engineering & Technology.

[18]  F. Segond,et al.  An Experiment in Semantic Tagging using Hidden Markov Model Tagging , 1997 .

[19]  Ying Wang,et al.  Clustering Web Search Results Based on Interactive Suffix Tree Algorithm , 2008, 2008 Third International Conference on Convergence and Hybrid Information Technology.