Syntactic Simplification for Improving Content Selection in Multi-Document Summarization

In this paper, we explore the use of automatic syntactic simplification for improving content selection in multi-document summarization. In particular, we show how simplifying parentheticals by removing relative clauses and appositives results in improved sentence clustering, by forcing clustering based on central rather than background information. We argue that the inclusion of parenthetical information in a summary is a reference-generation task rather than a content-selection one, and implement a baseline reference rewriting module. We perform our evaluations on the test sets from the 2003 and 2004 Document Understanding Conference and report that simplifying parentheticals results in significant improvement on the automated evaluation metric Rouge.

[1]  Eleazar Eskin,et al.  Detecting Text Similarity over Short Passages: Exploring Linguistic Feature Combinations via Machine Learning , 1999, EMNLP.

[2]  Stefan Riezler,et al.  Statistical Sentence Condensation using Ambiguity Packing and Stochastic Disambiguation Methods for Lexical-Functional Grammar , 2003, NAACL.

[3]  Hongyan Jing,et al.  Sentence Reduction for Automatic Text Summarization , 2000, ANLP.

[4]  Chin-Yew Lin Improving summarization performance by sentence compression: a pilot study , 2003, IRAL.

[5]  Gregory Grefenstette Producing Intelligent Telegraphic Text Reduction to provide an Audio Scanning Service for the Blind , 1998 .

[6]  Raman Chandrasekar,et al.  Motivations and Methods for Text Simplification , 1996, COLING.

[7]  W. J. Langford Statistical Methods , 1959, Nature.

[8]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[9]  Advaith Siddharthan Resolving Attachment and Clause Boundary Ambiguities for Simplifying Relative Clause Constructs , 2002 .

[10]  David J. Weir,et al.  Parsing with an Extended Domain of Locality , 1999, EACL.

[11]  Kathleen R. McKeown,et al.  Information fusion for multidocument summarization: paraphrasing and generation , 2003 .

[12]  Ani Nenkova,et al.  References to Named Entities: a Corpus Study , 2003, HLT-NAACL.

[13]  Eduard H. Hovy,et al.  Aggregation in Natural Language Generation , 1993, EWNLG.

[14]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[15]  Advaith Siddharthan,et al.  Syntactic Simplification and Text Cohesion , 2006 .

[16]  Marc Moens,et al.  LT TTT - A Flexible Tokenisation Tool , 2000, LREC.

[17]  Nina Wacholder,et al.  Disambiguation of Proper Names in Text , 1997, ANLP.

[18]  Michael Zock,et al.  Trends in Natural Language Generation An Artificial Intelligence Perspective , 1996, Lecture Notes in Computer Science.

[19]  Siobhan Devlin,et al.  Simplifying Text for Language-Impaired Readers , 1999, EACL.

[20]  Daniel Marcu,et al.  Statistics-Based Summarization - Step One: Sentence Compression , 2000, AAAI/IAAI.