Jointly Learning to Extract and Compress

We learn a joint model of sentence extraction and compression for multi-document summarization. Our model scores candidate summaries according to a combined linear model whose features factor over (1) the n-gram types in the summary and (2) the compressions used. We train the model using a margin-based objective whose loss captures end summary quality. Because of the exponentially large set of candidate summaries, we use a cutting-plane algorithm to incrementally detect and add active constraints efficiently. Inference in our model can be cast as an ILP and thereby solved in reasonable time; we also present a fast approximation scheme which achieves similar performance. Our jointly extracted and compressed summaries outperform both unlearned baselines and our learned extraction-only system on both ROUGE and Pyramid, without a drop in judged linguistic quality. We achieve the highest published ROUGE results to date on the TAC 2008 data set.

[1]  Hua Li,et al.  Document Summarization Using Conditional Random Fields , 2007, IJCAI.

[2]  Dan Klein,et al.  Learning and Inference for Hierarchically Split PCFGs , 2007, AAAI.

[3]  Chin-Yew Lin Improving summarization performance by sentence compression: a pilot study , 2003, IRAL.

[4]  Ben Taskar,et al.  An End-to-End Discriminative Approach to Machine Translation , 2006, ACL.

[5]  Mirella Lapata,et al.  Automatic Generation of Story Highlights , 2010, ACL.

[6]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[7]  Ani Nenkova,et al.  The Impact of Frequency on Summarization , 2005 .

[8]  Daniel Marcu,et al.  Practical structured learning techniques for natural language processing , 2006 .

[9]  Noah A. Smith,et al.  Summarization with a Joint Model for Sentence Extraction and Compression , 2009, ILP 2009.

[10]  Yong Yu,et al.  Enhancing diversity, coverage and balance for summarization through structure learning , 2009, WWW '09.

[11]  Ani Nenkova,et al.  Evaluating Content Selection in Summarization: The Pyramid Method , 2004, NAACL.

[12]  Frank Schilder,et al.  FastSum: Fast and Accurate Query-based Multi-document Summarization , 2008, ACL.

[13]  Philip Resnik,et al.  Online Large-Margin Training of Syntactic and Structural Translation Features , 2008, EMNLP.

[14]  Daniel Marcu,et al.  Statistics-Based Summarization - Step One: Sentence Compression , 2000, AAAI/IAAI.

[15]  J. Clarke,et al.  Global inference for sentence compression : an integer linear programming approach , 2008, J. Artif. Intell. Res..

[16]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[17]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[18]  Ryan T. McDonald Discriminative Sentence Compression with Soft Syntactic Evidence , 2006, EACL.

[19]  Benoit Favre,et al.  A Scalable Global Model for Summarization , 2009, ILP 2009.

[20]  Koby Crammer,et al.  Ultraconservative Online Algorithms for Multiclass Problems , 2001, J. Mach. Learn. Res..

[21]  Yang Liu,et al.  Non-Expert Evaluation of Summarization Systems is Risky , 2010, Mturk@HLT-NAACL.

[22]  Ben Taskar,et al.  Max-Margin Parsing , 2004, EMNLP.

[23]  Jimmy J. Lin,et al.  Sentence Compression as a Component of a Multi-Document Summarization System , 2006 .

[24]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[25]  Joshua Goodman,et al.  Multi-Document Summarization by Maximizing Informative Content-Words , 2007, IJCAI.

[26]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[27]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[28]  Simone Teufel,et al.  Sentence extraction as a classification task , 1997 .