ReClose: web page summarization combining summary techniques

Purpose – Search engine users are faced with long lists of search results, each entry being of a varying degree of relevance. Often users' expectations based on the short text of a search result hold false expectations about the linked web page. This leads users to skip relevant information, missing valuable insights, and click on irrelevant web pages wasting time. The purpose of this paper is to propose a new summary generation technique, ReClose, which combines query‐independent and query‐biased summary techniques to improve the accuracy of users' expectations.Design/methodology/approach – The authors tested the effectiveness of ReClose summaries against Google summaries by surveying 34 participants. Participants were randomly assigned to use one type of summary approach. Summary effectiveness was judged based on the accuracy of each user's expectations.Findings – It was found that individuals using ReClose summaries showed a 10 per cent increase in the expectation accuracy over individuals using Google...

[1]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[2]  Mark Sanderson,et al.  Advantages of query biased summaries in information retrieval , 1998, SIGIR '98.

[3]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[4]  Barbara Di Eugenio,et al.  Machine learning in automatic text summarization: from extracting to abstracting , 2006 .

[5]  David Jacobson,et al.  Impression Formation in Cyberspace: Online Expectations and Offline Experiences in Text-based Virtual Communities , 2006, J. Comput. Mediat. Commun..

[6]  Edward Cutrell,et al.  What are you looking for?: an eye-tracking study of information usage in web search , 2007, CHI.

[7]  Mehmed M. Kantardzic,et al.  Sentence Ranking for Search Document Summarization Based on the Wisdom of Three Search Engines , 2010, International Conference on Internet Computing.

[8]  J. Ross Quinlan,et al.  Simplifying Decision Trees , 1987, Int. J. Man Mach. Stud..

[9]  J. Friedman Stochastic gradient boosting , 2002 .

[10]  Tapas Kanungo,et al.  Machine Learned Sentence Selection Strategies for Query-Biased Summarization , 2008 .

[11]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[12]  Ryen W. White,et al.  Query-biased web page summarisation: a task-oriented evaluation , 2001, SIGIR '01.

[13]  Tao Li,et al.  Beyond Single-Page Web Search Results , 2008, IEEE Transactions on Knowledge and Data Engineering.

[14]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[15]  Greg Ridgeway,et al.  Generalized Boosted Models: A guide to the gbm package , 2006 .

[16]  L. Freeman Centrality in social networks conceptual clarification , 1978 .

[17]  Ziyang Liu,et al.  Query biased snippet generation in XML search , 2008, SIGMOD Conference.

[18]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[19]  Thorsten Joachims,et al.  Eye-tracking analysis of user behavior in WWW search , 2004, SIGIR '04.

[20]  Rada Mihalcea,et al.  Graph-based Ranking Algorithms for Sentence Extraction, Applied to Text Summarization , 2004, ACL.

[21]  Jacob Ratkiewicz,et al.  Text Snippets from the DomGraph , 2008 .

[22]  Cândida Ferreira,et al.  Gene Expression Programming: A New Adaptive Algorithm for Solving Problems , 2001, Complex Syst..

[23]  W. Bruce Croft,et al.  A general language model for information retrieval , 1999, CIKM '99.

[24]  Lucy Vanderwende,et al.  Enhancing Single-Document Summarization by Combining RankNet and Third-Party Sources , 2007, EMNLP.

[25]  Gert Sabidussi,et al.  The centrality index of a graph , 1966 .

[26]  Daniel E. Rose,et al.  Summary attributes and perceived search quality , 2007, WWW '07.

[27]  Dwi H. Widyantoro,et al.  Generating Indicative and Informative Summaries For Search Engine Results , 2007 .

[28]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[29]  P. Jaccard,et al.  Etude comparative de la distribution florale dans une portion des Alpes et des Jura , 1901 .

[30]  Francine Chen,et al.  A trainable document summarizer , 1995, SIGIR '95.

[31]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[32]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[33]  T. E. R. Singer,et al.  Abstracting scientific and technical literature;: An introductory guide and text for scientists, abstractors, and management , 1971 .

[34]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[35]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[36]  Pablo Gervás,et al.  User-model based personalized summarization , 2007, Inf. Process. Manag..

[37]  Dragomir R. Radev,et al.  Biased LexRank: Passage retrieval using random walks with question-based priors , 2009, Inf. Process. Manag..

[38]  Dragomir R. Radev,et al.  LexRank: Graph-based Centrality as Salience in Text Summarization , 2004 .

[39]  Dolf Talman,et al.  Measuring the Power of Nodes in Digraphs , 2001 .