Enhancing web search by using query-based clusters and multi-document summaries

Current web search engines, such as Google, Bing, and Yahoo!, rank the set of documents SD retrieved in response to a user query and display each document D in SD with a title and a snippet, which serves as an abstract of D. Snippets, however, are not as useful as they are designed for, i.e., assisting its users to quickly identify results of interest, if they exist. These snippets are inadequate in providing distinct information and capturing the main contents of the corresponding documents. Moreover, when the intended information need specified in a search query is ambiguous, it is very difficult, if not impossible, for a search engine to identify precisely the set of documents that satisfy the user’s intended request without requiring additional inputs. Furthermore, a document title is not always a good indicator of the content of the corresponding document. All of these design problems can be solved by our proposed query-based cluster and summarizer, called $$Q_{Sum}$$QSum. $$Q_{Sum}$$QSum generates a concise/comprehensive summary for each cluster of documents retrieved in response to a user query, which saves the user’s time and effort in searching for specific information of interest without having to browse through the documents one by one. Experimental results show that $$Q_{Sum}$$QSum is effective and efficient in generating a high-quality summary for each cluster of documents on a specific topic.

[1]  Kathleen R. McKeown,et al.  Experiments in multidocument summarization , 2002 .

[2]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[3]  Marko Grobelnik,et al.  Learning Sub-structures of Document Semantic Graphs for Document Summarization , 2004 .

[4]  Rasim M. Alguliev,et al.  Automatic Text Documents Summarization through Sentences Clustering , 2008 .

[5]  George Luger,et al.  Artificial Intelligence: Structures and Strategies for Complex Problem Solving (5th Edition) , 2004 .

[6]  Christopher S. G. Khoo,et al.  A Hierarchical Framework for Multi-document Summarization of Dissertation Abstracts , 2002, ICADL.

[7]  Lin-Chih Chen,et al.  Using a new relational concept to improve the clustering performance of search engines , 2011, Inf. Process. Manag..

[8]  Leonard J. Kazmier Business Statistics : Based on Schaum's Outline of Theory and Problems of Business Statistics, Third Edition, by Leonard J. Kazmier , 2003 .

[9]  Balaraman Ravindran,et al.  Latent dirichlet allocation based multi-document summarization , 2008, AND '08.

[10]  Eduard H. Hovy,et al.  From Single to Multi-document Summarization , 2002, ACL.

[11]  Dell Zhang,et al.  Semantic, Hierarchical, Online Clustering of Web Search Results , 2004, APWeb.

[12]  Yiu-Kai Ng,et al.  Enhancing Web Search Using Query-Based Clusters and Labels , 2013, 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT).

[13]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[14]  Shashi Shekhar,et al.  An Architectural Framework of a Crawler for Retrieving Highly Relevant Web Documents by Filtering Replicated Web Collections , 2010, 2010 International Conference on Advances in Computer Engineering.

[15]  Emilio Di Giacomo,et al.  Graph Visualization Techniques for Web Clustering Engines , 2007, IEEE Transactions on Visualization and Computer Graphics.

[16]  Xiaotie Deng,et al.  A new suffix tree similarity measure for document clustering , 2007, WWW '07.

[17]  Stanislaw Osinski Improving Quality of Search Results Clustering with Approximate Matrix Factorisations , 2006, ECIR.

[18]  Martin Braschler,et al.  Multilingual Information Retrieval Based on Document Alignment Techniques , 1998, ECDL.

[19]  Dianne P. O'Leary,et al.  QCS: A system for querying, clustering and summarizing documents , 2007, Inf. Process. Manag..

[20]  Yuji Matsumoto,et al.  Generic Text Summarization Using Probabilistic Latent Semantic Indexing , 2008, IJCNLP.

[21]  M. Kenward,et al.  Design and Analysis of Cross-Over Trials , 1989 .

[22]  Fabrizio Sebastiani,et al.  A scalable algorithm for high-quality clustering of web snippets , 2006, SAC.

[23]  Yiu-Kai Ng,et al.  Using Word Clusters to Detect Similar Web Documents , 2006, KSEM.

[24]  Phyllis B. Baxendale,et al.  Machine-Made Index for Technical Literature - An Experiment , 1958, IBM J. Res. Dev..

[25]  Philip S. Yu,et al.  Adding the temporal dimension to search - a case study in publication search , 2005, The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05).

[26]  Massih-Reza Amini,et al.  Incorporating prior knowledge into a transductive ranking algorithm for multi-document summarization , 2009, SIGIR.

[27]  Christopher S. G. Khoo,et al.  AUTOMATIC MULTI-DOCUMENT SUMMARIZATION FOR DIGITAL LIBRARIES , 2006 .

[28]  Oren Etzioni,et al.  Web document clustering: a feasibility demonstration , 1998, SIGIR '98.

[29]  Michael G. Kenward,et al.  Design and Analysis of Cross-Over Trials, Second Edition , 2003 .

[30]  Jiawei Han,et al.  Hierarchical Web-Page Clustering via In-Page and Cross-Page Link Structures , 2010, PAKDD.

[31]  Amanda Spink,et al.  Real life, real users, and real needs: a study and analysis of user queries on the web , 2000, Inf. Process. Manag..

[32]  Moshe Tennenholtz,et al.  Ranking systems: the PageRank axioms , 2005, EC '05.

[33]  George F. Luger,et al.  AI Algorithms, Data Structures, and Idioms in Prolog, Lisp, and Java for Artificial Intelligence: Structures and Strategies for Complex Problem Solving , 2008 .

[34]  Paolo Ferragina,et al.  A personalized search engine based on Web‐snippet hierarchical clustering , 2008, Softw. Pract. Exp..

[35]  Marti A. Hearst,et al.  Reexamining the cluster hypothesis: scatter/gather on retrieval results , 1996, SIGIR '96.

[36]  Oren Etzioni,et al.  Grouper: A Dynamic Clustering Interface to Web Search Results , 1999, Comput. Networks.

[37]  George F. Luger,et al.  Artificial intelligence - structures and strategies for complex problem solving (2. ed.) , 1993 .

[38]  Chin-Yew Lin,et al.  From Single to Multi-document Summarization : A Prototype System and its Evaluation , 2002 .

[39]  Qiang Yang,et al.  Query enrichment for web-query classification , 2006, TOIS.

[40]  Dianne P. O'Leary,et al.  Arabic/English Multi-document Summarization with CLASSY - The Past and the Future , 2008, CICLing.

[41]  Oren Etzioni,et al.  Towards comprehensive web search , 1999 .

[42]  Leonard J. Kazmier Schaum's Outline of Business Statistics , 1976 .

[43]  Laurie Rozakis Test taking strategies and study skills for the utterly confused , 2003 .

[44]  Wei-Ying Ma,et al.  Learning to cluster web search results , 2004, SIGIR '04.

[45]  Ke-Jian Wang,et al.  Clustering web search results using conceptual grouping , 2009, 2009 International Conference on Machine Learning and Cybernetics.