Multi-candidate reduction: Sentence compression as a tool for document summarization tasks

This article examines the application of two single-document sentence compression techniques to the problem of multi-document summarization-a ''parse-and-trim'' approach and a statistical noisy-channel approach. We introduce the multi-candidate reduction (MCR) framework for multi-document summarization, in which many compressed candidates are generated for each source sentence. These candidates are then selected for inclusion in the final summary based on a combination of static and dynamic features. Evaluations demonstrate that sentence compression is a valuable component of a larger multi-document summarization framework.

[1]  Eugene Charniak,et al.  Supervised and Unsupervised Learning for Sentence Compression , 2005, ACL.

[2]  David D. Lewis,et al.  An evaluation of phrasal and clustered representations on a text categorization task , 1992, SIGIR '92.

[3]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[4]  Scott Miller,et al.  A Novel Use of Statistical Parsing to Extract Information from Text , 2000, ANLP.

[5]  Tehemton E. Udwadia,et al.  This Special Issue , 2005, Journal of minimal access surgery.

[6]  David Evans,et al.  Columbia University at DUC 2004 , 2004 .

[7]  John M. Conroy,et al.  Back to Basics: CLASSY 2006 , 2006 .

[8]  Richard M. Schwartz,et al.  A maximum likelihood model for topic classification of broadcast news , 1997, EUROSPEECH.

[9]  Richard M. Schwartz,et al.  Hedge Trimmer: A Parse-and-Trim Approach to Headline Generation , 2003, HLT-NAACL 2003.

[10]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[11]  Penelope Sibun,et al.  A Practical Part-of-Speech Tagger , 1992, ANLP.

[12]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[13]  Richard M. Schwartz,et al.  An Algorithm that Learns What's in a Name , 1999, Machine Learning.

[14]  Naoaki Okazaki,et al.  Improving Chronological Sentence Ordering by Precedence Relation , 2004, COLING.

[15]  Mirella Lapata,et al.  Probabilistic Text Structuring: Experiments with Sentence Ordering , 2003, ACL.

[16]  Richard M. Schwartz,et al.  A Sentence-Trimming Approach to Multi-Document Summarization , 2005 .

[17]  Wai Lam,et al.  MEAD - A Platform for Multidocument Multilingual Text Summarization , 2004, LREC.

[18]  Jaime Carbonell,et al.  Multi-Document Summarization By Sentence Extraction , 2000 .

[19]  Michele Banko,et al.  Headline Generation Based on Statistical Translation , 2000, ACL.

[20]  Regina Barzilay,et al.  Inferring Strategies for Sentence Ordering in Multidocument News Summarization , 2002, J. Artif. Intell. Res..

[21]  Ted E. Dunning,et al.  Statistical Identification of Language , 1994 .

[22]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Richard M. Schwartz,et al.  BBN/UMD at DUC-2004: Topiary , 2004 .

[24]  Robert L. Mercer,et al.  Context based spelling correction , 1991, Inf. Process. Manag..

[25]  Jimmy J. Lin,et al.  Multiple alternative sentence compressions as a tool for automatic summarization tasks , 2007 .

[26]  Smaranda Muresan,et al.  Combining linguistic and machine learning techniques for email summarization , 2001, CoNLL.

[27]  Timm Euler Tailoring text using topic words: Selection and compression , 2002, Proceedings. 13th International Workshop on Database and Expert Systems Applications.

[28]  Daniel Marcu,et al.  Statistics-Based Summarization - Step One: Sentence Compression , 2000, AAAI/IAAI.

[29]  Eamonn Newman,et al.  Comparing Topiary-Style Approaches to Headline Generation , 2005, ECIR.

[30]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[31]  Bonnie J. Dorr,et al.  Exploiting aspectual features and connecting words for summarization-inspired temporal-relation extraction , 2007, Inf. Process. Manag..

[32]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[33]  Liang Zhou,et al.  Headline Summarization at ISI , 2003 .

[34]  Richard M. Schwartz,et al.  An algorithm for unsupervised topic discovery from broadcast news stories , 2002 .

[35]  Richard M. Schwartz,et al.  Cross-language headline generation for Hindi , 2003, TALIP.

[36]  Lucy Vanderwende,et al.  Microsoft Research at DUC2006: Task-Focused Summarization with Sentence Simplification and Lexical Expansion , 2006 .

[37]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[38]  Kathleen McKeown,et al.  Cut and Paste Based Text Summarization , 2000, ANLP.

[39]  Yiming Yang,et al.  Introducing the Enron Corpus , 2004, CEAS.

[40]  Mirella Lapata,et al.  Models for Sentence Compression: A Comparison across Domains, Training Requirements and Evaluation Measures , 2006, ACL.

[41]  Ralf Krestel,et al.  {Using Knowledge-poor Coreference Resolution for Text Summarization} , 2003 .

[42]  Richard M. Schwartz,et al.  UMD/BBN at MSE2005 , 2005 .

[43]  Ingrid Mårdh,et al.  Headlinese : on the grammar of English front page headlines , 1980 .

[44]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[45]  Daniel Marcu,et al.  Summarization beyond sentence extraction: A probabilistic approach to sentence compression , 2002, Artif. Intell..