论文信息 - The Anatomy of SnakeT: A Hierarchical Clustering Engine for Web-Page Snippets

The Anatomy of SnakeT: A Hierarchical Clustering Engine for Web-Page Snippets

The purpose of a search engine is to retrieve from a given textual collection the documents deemed relevant for a user query. Typically a user query is modeled as a set of keywords, and a document is a Web page, a pdf file or whichever file can be parsed into a set of tokens (words). Documents are ranked in a flat list according to some measure of relevance to the user query. That list contains hyperlinks to the relevant documents, their titles, and also the so called (page or web) snippets, namely document excerpts allowing the user to understand if a document is indeed relevant without accessing it.

Paolo Ferragina | Antonio Gulli | P. Ferragina | Antonio Gullì

[1] W. Bruce Croft,et al. Generating hierarchical summaries for web searches , 2003, SIGIR '03.

[2] Dawid Weiss,et al. Web Search Results Clustering in Polish: Experimental Evaluation of Carrot , 2003, IIS.

[3] Dell Zhang,et al. Semantic, Hierarchical, Online Clustering of Web Search Results , 2004, APWeb.

[4] Shourya Roy,et al. A hierarchical monothetic document clustering algorithm for summarization and browsing search results , 2004, WWW '04.

[5] Dino Pedreschi,et al. WebCat: Automatic Categorization of Web Search Results , 2003, SEBD.

[6] Israel Ben-Shaul,et al. Ephemeral Document Clustering for Web Applications , 2001 .

[7] Wei-Ying Ma,et al. Learning to cluster web search results , 2004, SIGIR '04.

[8] Benjamin C. M. Fung,et al. Hierarchical Document Clustering using Frequent Itemsets , 2003, SDM.

[9] Oren Etzioni,et al. Grouper: A Dynamic Clustering Interface to Web Search Results , 1999, Comput. Networks.