论文信息 - Multi-candidate reduction: Sentence compression as a tool for document summarization tasks - 字舞流文

Multi-candidate reduction: Sentence compression as a tool for document summarization tasks

This article examines the application of two single-document sentence compression techniques to the problem of multi-document summarization-a ''parse-and-trim'' approach and a statistical noisy-channel approach. We introduce the multi-candidate reduction (MCR) framework for multi-document summarization, in which many compressed candidates are generated for each source sentence. These candidates are then selected for inclusion in the final summary based on a combination of static and dynamic features. Evaluations demonstrate that sentence compression is a valuable component of a larger multi-document summarization framework.

Jimmy J. Lin | Richard M. Schwartz | Bonnie J. Dorr | David M. Zajic | R. Schwartz | B. Dorr

[1] Eugene Charniak,et al. Supervised and Unsupervised Learning for Sentence Compression , 2005, ACL.

[2] David D. Lewis,et al. An evaluation of phrasal and clustered representations on a text categorization task , 1992, SIGIR '92.

[3] Eugene Charniak,et al. A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[4] Scott Miller,et al. A Novel Use of Statistical Parsing to Extract Information from Text , 2000, ANLP.

[5] Tehemton E. Udwadia,et al. This Special Issue , 2005, Journal of minimal access surgery.

[6] David Evans,et al. Columbia University at DUC 2004 , 2004 .

[7] John M. Conroy,et al. Back to Basics: CLASSY 2006 , 2006 .

[8] Richard M. Schwartz,et al. A maximum likelihood model for topic classification of broadcast news , 1997, EUROSPEECH.

[9] Richard M. Schwartz,et al. Hedge Trimmer: A Parse-and-Trim Approach to Headline Generation , 2003, HLT-NAACL 2003.

[10] Martin F. Porter,et al. An algorithm for suffix stripping , 1997, Program.

[11] Penelope Sibun,et al. A Practical Part-of-Speech Tagger , 1992, ANLP.

[12] Andrew J. Viterbi,et al. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[13] Richard M. Schwartz,et al. An Algorithm that Learns What's in a Name , 1999, Machine Learning.

[14] Naoaki Okazaki,et al. Improving Chronological Sentence Ordering by Precedence Relation , 2004, COLING.

[15] Mirella Lapata,et al. Probabilistic Text Structuring: Experiments with Sentence Ordering , 2003, ACL.

[16] Richard M. Schwartz,et al. A Sentence-Trimming Approach to Multi-Document Summarization , 2005 .

[17] Wai Lam,et al. MEAD - A Platform for Multidocument Multilingual Text Summarization , 2004, LREC.

[18] Jaime Carbonell,et al. Multi-Document Summarization By Sentence Extraction , 2000 .

[19] Michele Banko,et al. Headline Generation Based on Statistical Translation , 2000, ACL.

[20] Regina Barzilay,et al. Inferring Strategies for Sentence Ordering in Multidocument News Summarization , 2002, J. Artif. Intell. Res..

[21] Ted E. Dunning,et al. Statistical Identification of Language , 1994 .

[22] Lalit R. Bahl,et al. A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23] Richard M. Schwartz,et al. BBN/UMD at DUC-2004: Topiary , 2004 .

[24] Robert L. Mercer,et al. Context based spelling correction , 1991, Inf. Process. Manag..

[25] Jimmy J. Lin,et al. Multiple alternative sentence compressions as a tool for automatic summarization tasks , 2007 .

[26] Smaranda Muresan,et al. Combining linguistic and machine learning techniques for email summarization , 2001, CoNLL.

[27] Timm Euler. Tailoring text using topic words: Selection and compression , 2002, Proceedings. 13th International Workshop on Database and Expert Systems Applications.

[28] Daniel Marcu,et al. Statistics-Based Summarization - Step One: Sentence Compression , 2000, AAAI/IAAI.

[29] Eamonn Newman,et al. Comparing Topiary-Style Approaches to Headline Generation , 2005, ECIR.

[30] John Cocke,et al. A Statistical Approach to Machine Translation , 1990, CL.

[31] Bonnie J. Dorr,et al. Exploiting aspectual features and connecting words for summarization-inspired temporal-relation extraction , 2007, Inf. Process. Manag..

[32] Eduard H. Hovy,et al. Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[33] Liang Zhou,et al. Headline Summarization at ISI , 2003 .

[34] Richard M. Schwartz,et al. An algorithm for unsupervised topic discovery from broadcast news stories , 2002 .

[35] Richard M. Schwartz,et al. Cross-language headline generation for Hindi , 2003, TALIP.

[36] Lucy Vanderwende,et al. Microsoft Research at DUC2006: Task-Focused Summarization with Sentence Simplification and Lexical Expansion , 2006 .

[37] L. Baum,et al. An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[38] Kathleen McKeown,et al. Cut and Paste Based Text Summarization , 2000, ANLP.

[39] Yiming Yang,et al. Introducing the Enron Corpus , 2004, CEAS.

[40] Mirella Lapata,et al. Models for Sentence Compression: A Comparison across Domains, Training Requirements and Evaluation Measures , 2006, ACL.

[41] Ralf Krestel,et al. {Using Knowledge-poor Coreference Resolution for Text Summarization} , 2003 .

[42] Richard M. Schwartz,et al. UMD/BBN at MSE2005 , 2005 .

[43] Ingrid Mårdh,et al. Headlinese : on the grammar of English front page headlines , 1980 .

[44] Jade Goldstein-Stewart,et al. The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[45] Daniel Marcu,et al. Summarization beyond sentence extraction: A probabilistic approach to sentence compression , 2002, Artif. Intell..