Sentence Fusion for Multidocument News Summarization

A system that can produce informative summaries, highlighting common information found in many online documents, will help Web users to pinpoint information that they need without extensive reading. In this article, we introduce sentence fusion, a novel text-to-text generation technique for synthesizing common information across documents. Sentence fusion involves bottom-up local multisequence alignment to identify phrases conveying similar information and statistical generation to combine common phrases into a sentence. Sentence fusion moves the summarization field from the use of purely extractive methods to the generation of abstracts that contain sentences not found in any of the input documents and can synthesize information across sources.

[1]  Ralph Grishman,et al.  Alignment of Shared Forests for Bilingual Corpora , 1996, COLING.

[2]  Kathleen F. McCoy,et al.  Efficiently Computed Lexical Chains as an Intermediate Representation for Automatic Text Summarization , 2002, CL.

[3]  A. D. Gordon,et al.  Obtaining common pruned trees , 1985 .

[4]  Michele Banko,et al.  Using N-Grams To Understand the Nature of Summaries , 2004, HLT-NAACL.

[5]  Regina Barzilay,et al.  Extracting Paraphrases from a Parallel Corpus , 2001, ACL.

[6]  Gregory Grefenstette Producing Intelligent Telegraphic Text Reduction to provide an Audio Scanning Service for the Blind , 1998 .

[7]  Regina Barzilay,et al.  Ordering Circumstantials for Multi-Document Summarization , 2001 .

[8]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[9]  Jimmy J. Lin,et al.  Web question answering: is more always better? , 2002, SIGIR '02.

[10]  Vasileios Hatzivassiloglou,et al.  Towards the Automatic Identification of Adjectival Scales: Clustering Adjectives According to Meaning , 1993, ACL.

[11]  Kevin Knight,et al.  Generation that Exploits Corpus-Based Statistical Knowledge , 1998, ACL.

[12]  Daniel Marcu,et al.  The Importance of Lexicalized Syntax Models for Natural Language Generation Tasks , 2002, INLG.

[13]  Eduard H. Hovy,et al.  From Single to Multi-document Summarization , 2002, ACL.

[14]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[15]  Regina Barzilay,et al.  Bootstrapping Lexical Choice via Multiple-Sequence Alignment , 2002, EMNLP.

[16]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[17]  Stuart M. Shieber,et al.  Comma Restoration Using Constituency Information , 2003, HLT-NAACL.

[18]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[19]  Dragomir R. Radev,et al.  Ranking suspected answers to natural language questions using predictive annotation , 2000, ANLP.

[20]  David Evans,et al.  Tracking and summarizing news on a daily basis with Columbia's Newsblaster , 2002 .

[21]  Harold Borko,et al.  Abstracting Concepts and Methods , 1975 .

[22]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[23]  Charles L. A. Clarke,et al.  Exploiting redundancy in question answering , 2001, SIGIR '01.

[24]  Kathleen R. McKeown,et al.  Information fusion for multidocument summarization: paraphrasing and generation , 2003 .

[25]  Raman Chandrasekar,et al.  Motivations and Methods for Text Simplification , 1996, COLING.

[26]  Srinivas Bangalore,et al.  Bootstrapping Bilingual Data using Consensus Translation for a Multilingual Instant Messaging System , 2002, COLING.

[27]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[28]  Kathleen McKeown,et al.  Empirically Designing and Evaluating a New Revision-Based Model for Summary Generation , 1996, Artif. Intell..

[29]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies , 2000, ArXiv.

[30]  Ani Nenkova,et al.  References to Named Entities: a Corpus Study , 2003, HLT-NAACL.

[31]  Daniel Marcu,et al.  Summarization beyond sentence extraction: A probabilistic approach to sentence compression , 2002, Artif. Intell..

[32]  Jennifer Chu-Carroll,et al.  In Question Answering, Two Heads Are Better Than One , 2003, NAACL.

[33]  Liam Murray,et al.  Mapping successful language learning approaches in the adaptation of generic software , 2006 .

[34]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.

[35]  Graeme Hirst,et al.  Lexical Cohesion Computed by Thesaural relations as an indicator of the structure of text , 1991, CL.

[36]  Igor Mel’čuk,et al.  Dependency Syntax: Theory and Practice , 1987 .

[37]  A. Stuart,et al.  Non-Parametric Statistics for the Behavioral Sciences. , 1957 .

[38]  Eleazar Eskin,et al.  Detecting Text Similarity over Short Passages: Exploring Linguistic Feature Combinations via Machine Learning , 1999, EMNLP.

[39]  John D. Lafferty,et al.  Cyberpunc: a lightweight punctuation annotation system for speech , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[40]  Stefan Riezler,et al.  Statistical Sentence Condensation using Ambiguity Packing and Stochastic Disambiguation Methods for Lexical-Functional Grammar , 2003, NAACL.

[41]  Inderjeet Mani,et al.  Multi-Document Summarization by Graph Search and Matching , 1997, AAAI/IAAI.

[42]  Daniel Marcu,et al.  Statistics-Based Summarization - Step One: Sentence Compression , 2000, AAAI/IAAI.

[43]  Dragomir R. Radev,et al.  Generating Natural Language Summaries from Multiple On-Line Sources , 1998, CL.

[44]  Regina Barzilay,et al.  Using Lexical Chains for Text Summarization , 1997 .

[45]  Sanda M. Harabagiu,et al.  Multi-Document Summarization Using Multiple-Sequence Alignment , 2004, LREC.

[46]  T. Speed,et al.  Biological Sequence Analysis , 1998 .

[47]  Mikkel Thorup,et al.  On the Agreement of Many Trees , 1995, Inf. Process. Lett..

[48]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[49]  Kathleen McKeown,et al.  Cut and Paste Based Text Summarization , 2000, ANLP.

[50]  Actress Elizabeth Taylor,et al.  Experiments in Multidocument Summarization , 2002 .

[51]  Kathleen R. McKeown,et al.  Columbia multi-document summarization : Approach and evaluation , 2001 .

[52]  Inderjeet Mani,et al.  Improving Summaries by Revising Them , 1999, ACL.

[53]  Daniel Marcu,et al.  Syntax-based Alignment of Multiple Translations: Extracting Paraphrases and Generating New Sentences , 2003, NAACL.

[54]  Vasileios Hatzivassiloglou,et al.  Two-Level, Many-Paths Generation , 1995, ACL.

[55]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[56]  Regina Barzilay,et al.  Information Fusion in the Context of Multi-Document Summarization , 1999, ACL.

[57]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[58]  Srinivas Bangalore,et al.  Exploiting a Probabilistic Hierarchical Model for Generation , 2000, COLING.

[59]  Ronald Rosenfeld,et al.  Statistical language modeling using the CMU-cambridge toolkit , 1997, EUROSPEECH.

[60]  Regina Barzilay,et al.  Inferring Strategies for Sentence Ordering in Multidocument News Summarization , 2002, J. Artif. Intell. Res..

[61]  Michael Elhadad,et al.  An Overview of SURGE: a Reusable Comprehensive Syntactic Realization Component , 1996, INLG.

[62]  Chin-Yew Lin,et al.  From Single to Multi-document Summarization : A Prototype System and its Evaluation , 2002 .

[63]  Amihood Amir,et al.  Maximum Agreement Subtree in a Set of Evolutionary Trees: Metrics and Efficient Algorithms , 1997, SIAM J. Comput..

[64]  D. Sankoff Minimal Mutation Trees of Sequences , 1975 .

[65]  Amihood Amir,et al.  Maximum agreement subtree in a set of evolutionary trees-metrics and efficient algorithms , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.