The Anatomy of SnakeT: A Hierarchical Clustering Engine for Web-Page Snippets

The purpose of a search engine is to retrieve from a given textual collection the documents deemed relevant for a user query. Typically a user query is modeled as a set of keywords, and a document is a Web page, a pdf file or whichever file can be parsed into a set of tokens (words). Documents are ranked in a flat list according to some measure of relevance to the user query. That list contains hyperlinks to the relevant documents, their titles, and also the so called (page or web) snippets, namely document excerpts allowing the user to understand if a document is indeed relevant without accessing it.