A system for query-specific document summarization

There has been a great amount of work on query-independent summarization of documents. However, due to the success of Web search engines query-specific document summarization (query result snippets) has become an important problem, which has received little attention. We present a method to create query-specific summaries by identifying the most query-relevant fragments and combining them using the semantic associations within the document. In particular, we first add structure to the documents in the preprocessing stage and convert them to document graphs. Then, the best summaries are computed by calculating the top spanning trees on the document graphs. We present and experimentally evaluate efficient algorithms that support computing summaries in interactive time. Furthermore, the quality of our summarization method is compared to current approaches using a user survey.

[1]  H. P. Edmundson,et al.  New Methods in Automatic Extracting , 1969, JACM.

[2]  Gabriele Reich,et al.  Beyond Steiner's Problem: A VLSI Oriented Generalization , 1989, WG.

[3]  Marti A. Hearst Using Categories to Provide Context for Full-Text Retrieval Results , 1994, RIAO.

[4]  Francine Chen,et al.  A trainable document summarizer , 1995, SIGIR '95.

[5]  Klaus Zechner,et al.  Fast Generation of Abstracts from General Domain Text Corpora by Extracting Relevant Sentences , 1996, COLING.

[6]  Gerard Salton,et al.  Automatic text decomposition using text segments and text themes , 1996, HYPERTEXT '96.

[7]  Gerard Salton,et al.  Automatic Text Structuring and Summarization , 1997, Inf. Process. Manag..

[8]  Daniel Marcu The rhetorical parsing of natural language texts , 1997 .

[9]  Jose Abracos,et al.  Statistical methods for retrieving most significant paragraphs in newspaper articles , 1997, Workshop On Intelligent Scalable Text Summarization.

[10]  The Rhetorical Parsing of Natural Language Texts , 1997, ACL.

[11]  Dan Suciu,et al.  Adding Structure to Unstructured Data , 1997, ICDT.

[12]  Regina Barzilay,et al.  Using Lexical Chains for Text Summarization , 1997 .

[13]  Eduard H. Hovy,et al.  Identifying Topics by Position , 1997, ANLP.

[14]  Roy Goldman,et al.  Proximity Search in Databases , 1998, VLDB.

[15]  Kathleen R. McKeown,et al.  Generating natural language summaries from multiple on-line sources , 1998 .

[16]  Mark Sanderson,et al.  Advantages of query biased summaries in information retrieval , 1998, SIGIR '98.

[17]  Jade Goldstein-Stewart,et al.  Summarizing text documents: sentence selection and evaluation metrics , 1999, SIGIR '99.

[18]  R. Barilay,et al.  Using lexical chains for text summarization , 1999 .

[19]  Daniel Marcu,et al.  Discourse Trees Are Good Indicators of Importance in Text , 1999 .

[20]  Vibhu O. Mittal,et al.  OCELOT: a system for summarizing Web pages , 2000, SIGIR '00.

[21]  Cécile Paris,et al.  Automatically summarising Web sites: is there a way around it? , 2000, CIKM '00.

[22]  Eduard H. Hovy,et al.  The Automated Acquisition of Topic Signatures for Text Summarization , 2000, COLING.

[23]  B. Beckman,et al.  BizTalk Server 2000 Business Process Orchestration. , 2001 .

[24]  Claire Cardie,et al.  Multidocument Summarization via Information Extraction , 2001, HLT.

[25]  Manabu Okumura,et al.  Text summarization challenge 2: text summarization evaluation at NTCIR workshop 3 , 2001, HLT-NAACL 2003.

[26]  Divyakant Agrawal,et al.  Retrieving and organizing web pages by “information unit” , 2001, WWW '01.

[27]  Surajit Chaudhuri,et al.  DBXplorer: a system for keyword-based search over relational databases , 2002, Proceedings 18th International Conference on Data Engineering.

[28]  Vagelis Hristidis,et al.  DISCOVER: Keyword Search in Relational Databases , 2002, VLDB.

[29]  Ryen W. White,et al.  Finding relevant documents using top ranking sentences: an evaluation of two alternative schemes , 2002, SIGIR '02.

[30]  Surajit Chaudhuri,et al.  DBXplorer: enabling keyword search over relational databases , 2002, SIGMOD '02.

[31]  S. Sudarshan,et al.  Keyword searching and browsing in databases using BANKS , 2002, Proceedings 18th International Conference on Data Engineering.

[32]  Luis Gravano,et al.  Efficient IR-Style Keyword Search over Relational Databases , 2003, VLDB.

[33]  Vagelis Hristidis,et al.  Keyword proximity search on XML graphs , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[34]  Hsin-Hsi Chen,et al.  Clustering and Visualization in a Multi-lingual Multi-document Summarization System , 2003, ECIR.

[35]  Chin-Yew Lin Improving Summarization Performance by Sentence Compression — A Pilot Study , 2003 .

[36]  Lin Guo XRANK : Ranked Keyword Search over XML Documents , 2003 .

[37]  Wei-Ying Ma,et al.  Learning block importance models for web pages , 2004, WWW '04.

[38]  Wei-Ying Ma,et al.  Block-level link analysis , 2004, SIGIR '04.

[39]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[40]  Min-Yen Kan,et al.  Stylistic and lexical co-training for web block classification , 2004, WIDM '04.

[41]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[42]  Dragomir R. Radev,et al.  LexRank: Graph-based Centrality as Salience in Text Summarization , 2004 .

[43]  Vagelis Hristidis,et al.  ObjectRank: Authority-Based Keyword Search in Databases , 2004, VLDB.

[44]  S. Sudarshan,et al.  Bidirectional Expansion For Keyword Search on Graph Databases , 2005, VLDB.

[45]  Vagelis Hristidis,et al.  Structure-based query-specific document summarization , 2005, CIKM '05.

[46]  Divesh Srivastava,et al.  Keyword proximity search in XML trees , 2006 .

[47]  Weiguo Fan,et al.  WebInEssence: A Personalized Web-Based Multi-Document Summarization and Recommendation System , 2008 .