An Intelligent Web Search Using Multi-Document Summarization

Information available on the internet is huge, diverse and dynamic. Current Search Engine is doing the task of intelligent help to the users of the internet. For a query, it provides a listing of best matching or relevant web pages. However, information for the query is often spread across multiple pages which are returned by the search engine. This degrades the quality of search results. So, the search engines are drowning in information, but starving for knowledge. Here, we present a query focused extractive summarization of search engine results. We propose a two level summarization process: identification of relevant theme clusters, and selection of top ranking sentences to form summarized result for user query. A new approach to semantic similarity computation using semantic roles and semantic meaning is proposed. Document clustering is effectively achieved by application of MDL principle and sentence clustering and ranking is done by using SNMF. Experiments conducted demonstrate the effectiveness of system in semantic text understanding, document clustering and summarization.

[1]  Danushka Bollegala,et al.  A Bottom-Up Approach to Sentence Ordering for Multi-Document Summarization , 2006, ACL.

[2]  Weiguo Fan,et al.  Automatic summarization of search engine hit lists , 2000 .

[3]  Tomek Strzalkowski,et al.  Cross-document summarization by concept classification , 2002, SIGIR '02.

[4]  Sadao Kurohashi,et al.  Summarizing Search Results using PLSI , 2010 .

[5]  Hongyuan Zha,et al.  Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering , 2002, SIGIR '02.

[6]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[7]  Zuhair Bandar,et al.  Sentence similarity based on semantic nets and corpus statistics , 2006, IEEE Transactions on Knowledge and Data Engineering.

[8]  Chris H. Q. Ding,et al.  Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization , 2008, SIGIR '08.

[9]  Cécile Paris,et al.  Automatically summarising Web sites: is there a way around it? , 2000, CIKM '00.

[10]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[11]  Daniel Marcu,et al.  Summarization beyond sentence extraction: A probabilistic approach to sentence compression , 2002, Artif. Intell..

[12]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[13]  Vibhu O. Mittal,et al.  OCELOT: a system for summarizing Web pages , 2000, SIGIR '00.

[14]  Mohsin Ali,et al.  Multi-document Text Summarization: SimWithFirst Based Features and Sentence Co-selection Based Evaluation , 2009, 2009 International Conference on Future Computer and Communication.

[15]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[16]  Rada Mihalcea,et al.  Graph-based Ranking Algorithms for Sentence Extraction, Applied to Text Summarization , 2004, ACL.

[17]  Wenjie Li,et al.  Mutually Reinforced Manifold-Ranking Based Relevance Propagation Model for Query-Focused Multi-Document Summarization , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents , 2004, Inf. Process. Manag..

[19]  Sanda M. Harabagiu,et al.  Topic themes for multi-document summarization , 2005, SIGIR '05.

[20]  Daniel Gildea,et al.  The Proposition Bank: An Annotated Corpus of Semantic Roles , 2005, CL.

[21]  Kyuseok Shim,et al.  TEXT: Automatic Template Extraction from Heterogeneous Web Pages , 2011, IEEE Transactions on Knowledge and Data Engineering.

[22]  Xindong Wu,et al.  News Filtering and Summarization on the Web , 2010, IEEE Intelligent Systems.

[23]  Xiaotie Deng,et al.  Efficient Phrase-Based Document Similarity for Clustering , 2008, IEEE Transactions on Knowledge and Data Engineering.

[24]  Dragomir R. Radev,et al.  LexRank: Graph-based Centrality as Salience in Text Summarization , 2004 .

[25]  Regina Barzilay,et al.  Sentence Fusion for Multidocument News Summarization , 2005, CL.

[26]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[27]  John M. Conroy,et al.  Machine and human performance for single and multidocument summarization , 2003 .

[28]  Yihong Gong,et al.  iHelp: An Intelligent Online Helpdesk System , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[29]  Danushka Bollegala,et al.  A preference learning approach to sentence ordering for multi-document summarization , 2012, Inf. Sci..

[30]  Tao Li,et al.  Beyond Single-Page Web Search Results , 2008, IEEE Transactions on Knowledge and Data Engineering.