Extractive Multi-Document Summarization with Integer Linear Programming and Support Vector Regression

We present a new method to generate extractive multi-document summaries. The method uses Integer Linear Programming to jointly maximize the importance of the sentences it includes in the summary and their diversity, without exceeding a maximum allowed summary length. To obtain an importance score for each sentence, it uses a Support Vector Regression model trained on human-authored summaries, whereas the diversity of the selected sentences is measured as the number of distinct word bigrams in the resulting summary. Experimental results on widely used benchmarks show that our method achieves state of the art results, when compared to competitive extractive summarizers, while being computationally efficient as well.

[1]  Mirella Lapata,et al.  Multiple Aspect Summarization Using Integer Linear Programming , 2012, EMNLP.

[2]  Hoa Trang Dang,et al.  Overview of DUC 2005 , 2005 .

[3]  Ryan T. McDonald A Study of Global Inference Algorithms in Multi-document Summarization , 2007, ECIR.

[4]  Frank Schilder,et al.  FastSum: Fast and Accurate Query-based Multi-document Summarization , 2008, ACL.

[5]  Hoa Trang Dang,et al.  Overview of DUC 2006 , 2006 .

[6]  Dimitrios Galanis,et al.  AUEB at TAC 2008 , 2008, TAC.

[7]  Hui Lin,et al.  A Class of Submodular Functions for Document Summarization , 2011, ACL.

[8]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[9]  John M. Conroy,et al.  CLASSY and TAC 2008 Metrics , 2008, TAC.

[10]  D. Marcu,et al.  Bayesian Summarization at DUC and a Suggestion for Extrinsic Evaluation , .

[11]  Tat-Seng Chua,et al.  NUS at DUC 2005: Understanding Documents via Concept Links , 2005 .

[12]  John M. Conroy,et al.  CLASSY 2007 at DUC 2007 , 2007 .

[13]  Tao Li,et al.  Multi-Document Summarization via the Minimum Dominating Set , 2010, COLING.

[14]  Vasudeva Varma,et al.  IIIT Hyderabad at DUC 2007 , 2007 .

[15]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[16]  Nicolas Usunier,et al.  A Contextual Query Expansion Approach by Term Clustering for Robust Text Summarization , 2007 .

[17]  Wei Li,et al.  The Hong Kong Polytechnic University at DUC2005 , 2005 .

[18]  Dilek Z. Hakkani-Tür,et al.  A Hybrid Hierarchical Model for Multi-Document Summarization , 2010, ACL.

[19]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[20]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[21]  Michael Gamon,et al.  The PYTHY Summarization System: Microsoft Research at DUC 2007 , 2007 .

[22]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[23]  Hoa Trang Dang,et al.  Overview of the TAC 2008 Update Summarization Task , 2008, TAC.

[24]  Dilek Z. Hakkani-Tür,et al.  The ICSI Summarization System at TAC 2008 , 2008, TAC.

[25]  Lucy Vanderwende,et al.  Exploring Content Models for Multi-Document Summarization , 2009, NAACL.

[26]  Takaaki Hasegawa,et al.  Optimizing Informativeness and Readability for Sentiment Summarization , 2010, ACL.

[27]  Dan Klein,et al.  Jointly Learning to Extract and Compress , 2011, ACL.

[28]  Takaaki Hasegawa,et al.  Opinion Summarization with Integer Linear Programming Formulation for Sentence Extraction and Ordering , 2010, COLING.

[29]  Benoit Favre,et al.  A Scalable Global Model for Summarization , 2009, ILP 2009.