论文信息 - Semantic Suffix Tree Clustering

Semantic Suffix Tree Clustering

This paper proposes a new algorithm, called Semantic Suffix Tree Clustering (SSTC), to cluster web search results containing semantic similarities. The distinctive methodology of the SSTC algorithm is that it simultaneously constructs the semantic suffix tree through an on-depth and on-breadth pass by using semantic similarity and string matching. The semantic similarity is derived from the WordNet lexical database for the English language. SSTC uses only subject-verb-object classification to generate clusters and readable labels. The algorithm also implements directed pruning to reduce the sub-tree sizes and to separate semantic clusters. Experimental results show that the proposed algorithm has better performance than conventional Suffix Tree Clustering (STC).

Sumanta Guha | Jongkol Janruang

[1] Dawid Weiss,et al. A survey of Web clustering engines , 2009, CSUR.

[2] Dan Gusfield,et al. Algorithms on strings , 1997 .

[3] Worapoj Kreesuradej,et al. A New Web Search Result Clustering based on True Common Phrase Label Discovery , 2006, 2006 International Conference on Computational Inteligence for Modelling Control and Automation and International Conference on Intelligent Agents Web Technologies and International Commerce (CIMCA'06).

[4] Xiaotie Deng,et al. A new suffix tree similarity measure for document clustering , 2007, WWW '07.

[5] Xiaohua Hu,et al. A coherent graph-based semantic clustering and summarization approach for biomedical literature and a new summarization evaluation method , 2007, BMC Bioinformatics.

[6] Dell Zhang,et al. Semantic, Hierarchical, Online Clustering of Web Search Results , 2004, APWeb.

[7] Stanislaw Osinski,et al. An Algorithm for Clustering of Web Search Results , 2003 .

[8] Oren Etzioni,et al. Web document clustering: a feasibility demonstration , 1998, SIGIR '98.

[9] M. Crochemore,et al. On-line construction of suffix trees , 2002 .

[10] Wei-Ying Ma,et al. Learning to cluster web search results , 2004, SIGIR '04.

[11] Benxiong Huang,et al. Web Search Results Clustering Based on a Novel Suffix Tree Structure , 2008, ATC.