Web-page summarization using clickthrough data

Most previous Web-page summarization methods treat a Web page as plain text. However, such methods fail to uncover the full knowledge associated with a Web page needed in building a high-quality summary, because many of these methods do not consider the hidden relationships in the Web. Uncovering the hidden knowledge is important in building good Web-page summarizers. In this paper, we extract the extra knowledge from the clickthrough data of a Web search engine to improve Web-page summarization. Wefirst analyze the feasibility in utilizing the clickthrough data to enhance Web-page summarization and then propose two adapted summarization methods that take advantage of the relationships discovered from the clickthrough data. For those pages that are not covered by the clickthrough data, we design a thematic lexicon approach to generate implicit knowledge for them. Our methods are evaluated on a dataset consisting of manually annotated pages as well as a large dataset that is crawled from the Open Directory Project website. The experimental results indicate that significant improvements can be achieved through our proposed summarizer as compared to the summarizers that do not use the clickthrough data.

[1]  Wei-Ying Ma,et al.  Web-page classification through summarization , 2004, SIGIR '04.

[2]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[3]  Jade Goldstein-Stewart,et al.  Summarizing text documents: sentence selection and evaluation metrics , 1999, SIGIR '99.

[4]  Jihoon Yang,et al.  Extracting sentence segments for text summarization: a machine learning approach , 2000, SIGIR '00.

[5]  Shui-Lung Chuang,et al.  Enriching Web taxonomies through subject categorization of query terms from search engine logs , 2003, Decis. Support Syst..

[6]  Cécile Paris,et al.  Automatically summarising Web sites: is there a way around it? , 2000, CIKM '00.

[7]  Shui-Lung Chuang,et al.  Using a web-based categorization approach to generate thematic metadata from texts , 2004, TALIP.

[8]  Shourya Roy,et al.  A hierarchical monothetic document clustering algorithm for summarization and browsing search results , 2004, WWW '04.

[9]  Huan Liu,et al.  CubeSVD: a novel approach to personalized Web search , 2005, WWW '05.

[10]  Xin Liu,et al.  Generic text summarization using relevance measure and latent semantic analysis , 2001, SIGIR '01.

[11]  Andreas Paepcke,et al.  Seeing the whole in parts: text summarization for web browsing on handheld devices , 2001, WWW '01.

[12]  Anette Hulth,et al.  Automatic Keyword Extraction Using Domain Knowledge , 2001, CICLing.

[13]  Susan T. Dumais,et al.  Using Linear Algebra for Intelligent Information Retrieval , 1995, SIAM Rev..

[14]  Inderjeet Mani,et al.  The Tipster Summac Text Summarization Evaluation , 1999, EACL.

[15]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[16]  Clement T. Yu,et al.  Personalized web search by mapping user queries to categories , 2002, CIKM '02.

[17]  Bernadette Bouchon-Meunier,et al.  Enhanced web document summarization using hyperlinks , 2003, HYPERTEXT '03.

[18]  Mark T. Maybury,et al.  Advances in Automatic Text Summarization , 1999 .