Sentence Compression Beyond Word Deletion

In this paper we generalise the sentence compression task. Rather than simply shorten a sentence by deleting words or constituents, as in previous work, we rewrite it using additional operations such as substitution, reordering, and insertion. We present a new corpus that is suited to our task and a discriminative tree-to-tree transduction model that can naturally account for structural and lexical mismatches. The model incorporates a novel grammar extraction method, uses a language model for coherent output, and can be easily tuned to a wide range of compression specific loss functions.

[1]  Mirella Lapata,et al.  Large Margin Synchronous Generation and its Application to Sentence Compression , 2007, EMNLP.

[2]  Thorsten Joachims,et al.  A support vector method for multivariate performance measures , 2005, ICML.

[3]  Kathleen R. McKeown,et al.  Information fusion for multidocument summarization: paraphrasing and generation , 2003 .

[4]  Daniel Marcu,et al.  A Noisy-Channel Model for Document Compression , 2002, ACL.

[5]  Simon Corston-Oliver,et al.  Text compaction for display on very small screens , 2001 .

[6]  Chin-Yew Lin Improving summarization performance by sentence compression: a pilot study , 2003, IRAL.

[7]  Jason Eisner,et al.  Learning Non-Isomorphic Tree Mappings for Machine Translation , 2003, ACL.

[8]  Kathleen McKeown,et al.  Lexicalized Markov Grammars for Sentence Compression , 2007, NAACL.

[9]  Daniel Marcu,et al.  Summarization beyond sentence extraction: A probabilistic approach to sentence compression , 2002, Artif. Intell..

[10]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[11]  Hongyan Jing,et al.  Sentence Reduction for Automatic Text Summarization , 2000, ANLP.

[12]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[13]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[14]  J. Clarke,et al.  Global inference for sentence compression : an integer linear programming approach , 2008, J. Artif. Intell. Res..

[15]  Yi Pan,et al.  Sentence Compression for Automated Subtitling: A Hybrid Approach , 2004, ACL 2004.

[16]  Daniel M. Bikel,et al.  Design of a multi-lingual, parallel-processing statistical parsing engine , 2002 .

[17]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[18]  Ben Taskar,et al.  Alignment by Agreement , 2006, NAACL.

[19]  Daniel Marcu,et al.  What’s in a translation rule? , 2004, NAACL.

[20]  Siobhan Devlin,et al.  Simplifying Text for Language-Impaired Readers , 1999, EACL.

[21]  Chris Callison-Burch,et al.  Paraphrasing with Bilingual Parallel Corpora , 2005, ACL.

[22]  Eugene Charniak,et al.  Supervised and Unsupervised Learning for Sentence Compression , 2005, ACL.

[23]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[24]  Chris Quirk,et al.  Monolingual Machine Translation for Paraphrase Generation , 2004, EMNLP.