A Label Quality-Oriented Method for Chinese Web Search Results Clustering

Web search results clustering is an important strategy of organizing the snippets for modern search engine. Due to the semantic problems, there are still many challenges. This paper proposes a method named label quality-oriented scheme (LQOS) to achieve the clustering quality in the aspect of label. Specialized for Chinese queries, LQOS is designed according to the criterion of “readable label”, aiming at reducing noisy labels, and achieving the informative, complete and concise labels. Experiments on Sougou Chinese queries show that LQOS outperforms Lingo algorithm in the metrics of labels quality and the percentage of clustered snippets.

[1]  Qiong Chen,et al.  Web Snippets Clustering Based on an Improved Suffix Tree Algorithm , 2009, 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery.

[2]  K. Vanhoof,et al.  Clustering navigation patterns on a website using a Sequence Alignment Method , 2001 .

[3]  Jiangning Wu,et al.  Search Results Clustering in Chinese Context Based on a New Suffix Tree , 2008, 2008 IEEE 8th International Conference on Computer and Information Technology Workshops.

[4]  Worapoj Kreesuradej,et al.  A New Web Search Result Clustering based on True Common Phrase Label Discovery , 2006, 2006 International Conference on Computational Inteligence for Modelling Control and Automation and International Conference on Intelligent Agents Web Technologies and International Commerce (CIMCA'06).

[5]  Dell Zhang,et al.  Semantic, Hierarchical, Online Clustering of Web Search Results , 2004, APWeb.

[6]  Wanli Zuo,et al.  Semantic-Based Hierarchicalize the Result of Suffix Tree Clustering , 2009, 2009 Second International Symposium on Knowledge Acquisition and Modeling.

[7]  Marti A. Hearst,et al.  Reexamining the cluster hypothesis: scatter/gather on retrieval results , 1996, SIGIR '96.

[8]  Oren Etzioni,et al.  Grouper: A Dynamic Clustering Interface to Web Search Results , 1999, Comput. Networks.

[9]  Wei-Ying Ma,et al.  Learning to cluster web search results , 2004, SIGIR '04.

[10]  Osmar R. Zaïane,et al.  Clustering Web sessions by sequence alignment , 2002, Proceedings. 13th International Workshop on Database and Expert Systems Applications.

[11]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[12]  M. Systems,et al.  2003 International Conference on Natural Language Processing and Knowledge Engineering : proceedings : NLP-KE 2003 : Beijing, China , 2003 .

[13]  Stanislaw Osinski,et al.  An Algorithm for Clustering of Web Search Results , 2003 .

[14]  Oren Etzioni,et al.  Clustering web documents: a phrase-based method for grouping search engine results , 1999 .

[15]  M. Nagao Natural language processing and knowledge , 2005 .

[16]  Dawid Weiss,et al.  A concept-driven algorithm for clustering search results , 2005, IEEE Intelligent Systems.