Searching web documents using a summarization approach

Purpose The purpose of this paper is to introduce a summarization method to enhance the current web-search approaches by offering a summary of each clustered set of web-search results with contents addressing the same topic, which should allow the user to quickly identify the information covered in the clustered search results. Web search engines, such as Google, Bing and Yahoo!, rank the set of documents S retrieved in response to a user query and represent each document D in S using a title and a snippet, which serves as an abstract of D. Snippets, however, are not as useful as they are designed for, i.e. assisting its users to quickly identify results of interest. These snippets are inadequate in providing distinct information and capture the main contents of the corresponding documents. Moreover, when the intended information need specified in a search query is ambiguous, it is very difficult, if not impossible, for a search engine to identify precisely the set of documents that satisfy the user’s intended request without requiring additional information. Furthermore, a document title is not always a good indicator of the content of the corresponding document either. Design/methodology/approach The authors propose to develop a query-based summarizer, called QSum, in solving the existing problems of Web search engines which use titles and abstracts in capturing the contents of retrieved documents. QSum generates a concise/comprehensive summary for each cluster of documents retrieved in response to a user query, which saves the user’s time and effort in searching for specific information of interest by skipping the step to browse through the retrieved documents one by one. Findings Experimental results show that QSum is effective and efficient in creating a high-quality summary for each cluster to enhance Web search. Originality/value The proposed query-based summarizer, QSum, is unique based on its searching approach. QSum is also a significant contribution to the Web search community, as it handles the ambiguous problem of a search query by creating summaries in response to different interpretations of the search which offer a “road map” to assist users to quickly identify information of interest.

[1]  Qiang Yang,et al.  Query enrichment for web-query classification , 2006, TOIS.

[2]  Phyllis B. Baxendale,et al.  Machine-Made Index for Technical Literature - An Experiment , 1958, IBM J. Res. Dev..

[3]  Dianne P. O'Leary,et al.  Arabic/English Multi-document Summarization with CLASSY - The Past and the Future , 2008, CICLing.

[4]  Dianne P. O'Leary,et al.  QCS: A system for querying, clustering and summarizing documents , 2007, Inf. Process. Manag..

[5]  Kathleen R. McKeown,et al.  Experiments in multidocument summarization , 2002 .

[6]  Kathleen R. McKeown,et al.  Experiments in multidocument summarization , 2002 .

[7]  Stanislaw Osinski Improving Quality of Search Results Clustering with Approximate Matrix Factorisations , 2006, ECIR.

[8]  Amanda Spink,et al.  Real life, real users, and real needs: a study and analysis of user queries on the web , 2000, Inf. Process. Manag..

[9]  Laurie Rozakis Test taking strategies and study skills for the utterly confused , 2003 .

[10]  M. Kenward,et al.  Design and Analysis of Cross-Over Trials , 1989 .

[11]  Chin-Yew Lin,et al.  From Single to Multi-document Summarization : A Prototype System and its Evaluation , 2002 .

[12]  Marko Grobelnik,et al.  Learning Sub-structures of Document Semantic Graphs for Document Summarization , 2004 .

[13]  Yiu-Kai Ng,et al.  Enhancing Web Search Using Query-Based Clusters and Labels , 2013, 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT).

[14]  Mark Levene,et al.  Search Engines: Information Retrieval in Practice , 2011, Comput. J..

[15]  Michael G. Kenward,et al.  Design and Analysis of Cross-Over Trials, Second Edition , 2003 .

[16]  Rasim M. Alguliev,et al.  Automatic Text Documents Summarization through Sentences Clustering , 2008 .

[17]  Moshe Tennenholtz,et al.  Ranking systems: the PageRank axioms , 2005, EC '05.

[18]  Philip S. Yu,et al.  Adding the temporal dimension to search - a case study in publication search , 2005, The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05).

[19]  Massih-Reza Amini,et al.  Incorporating prior knowledge into a transductive ranking algorithm for multi-document summarization , 2009, SIGIR.

[20]  Christopher S. G. Khoo,et al.  AUTOMATIC MULTI-DOCUMENT SUMMARIZATION FOR DIGITAL LIBRARIES , 2006 .

[21]  Yuji Matsumoto,et al.  Generic Text Summarization Using Probabilistic Latent Semantic Indexing , 2008, IJCNLP.

[22]  Leonard J. Kazmier Schaum's Outline of Business Statistics , 1976 .

[23]  Balaraman Ravindran,et al.  Latent dirichlet allocation based multi-document summarization , 2008, AND '08.

[24]  Eduard H. Hovy,et al.  From Single to Multi-document Summarization , 2002, ACL.

[25]  J. Golbeck In real life , 2016, Science.