A Regression-Based Approach Using Integer Linear Programming for Single-Document Summarization

Most of the existing approaches for extractive singledocument summarization of news articles rely on a single method to summarize all input documents. Recent work demonstrated that this is a significant limitation, since no summarization technique can achieve high performance for all input articles. In this context, this paper proposes a new regression-based approach using Integer Linear Programming (ILP) for single-document summarization. The proposed solution relies on a concept-based ILP method to generate multiple candidate summaries for each input article exploring different concept weighting methods and representation forms. Afterward, a regression model enriched with several extracted features at summary, sentence and ngram level is applied to select among the candidates the most informative summary based on an estimation of the traditional ROUGE-1 score. The investigated features are derived from indicators of content importance such as frequency, position, and coverage. Experiments conducted on the DUC 2001-2002 and CNN corpora show that the proposed method statistically outperforms other state-of-the-art extractive summarization approaches in most scenarios regarding ROUGE-1 and ROUGE-2 recall measures.

[1]  Teh Ying Wah,et al.  A Comparison Study on Similarity and Dissimilarity Measures in Clustering Continuous Data , 2015, PloS one.

[2]  Vishal Gupta,et al.  Recent automatic text summarization techniques: a survey , 2016, Artificial Intelligence Review.

[3]  Benoit Favre,et al.  A Scalable Global Model for Summarization , 2009, ILP 2009.

[4]  John M. Conroy,et al.  An Assessment of the Accuracy of Automatic Evaluation in Summarization , 2012, EvalMetrics@NAACL-HLT.

[5]  Ani Nenkova,et al.  A Survey of Text Summarization Techniques , 2012, Mining Text Data.

[6]  Benoît Favre,et al.  Concept-based Summarization using Integer Linear Programming: From Concept Pruning to Multiple Optimal Solutions , 2015, EMNLP.

[7]  Camille Guinaudeau,et al.  Graph-based Local Coherence Modeling , 2013, ACL.

[8]  Kai Hong,et al.  System Combination for Multi-document Summarization , 2015, EMNLP.

[9]  Anders Søgaard,et al.  Unsupervised extractive summarization via coverage maximization with syntactic and semantic concepts , 2015, ACL.

[10]  Rafael Dueire Lins,et al.  A Concept-Based Integer Linear Programming Approach for Single-Document Summarization , 2016, 2016 5th Brazilian Conference on Intelligent Systems (BRACIS).

[11]  Juan-Manuel Torres-Moreno,et al.  Automatic Text Summarization , 2014 .

[12]  Xiaojun Wan,et al.  Multi-Document Summarization via Discriminative Summary Reranking , 2015, ArXiv.

[13]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[14]  Rafael Dueire Lins,et al.  Assessing shallow sentence scoring techniques and combinations for single and multi-document summarization , 2016, Expert Syst. Appl..

[15]  Rafael Dueire Lins,et al.  A Quantitative and Qualitative Assessment of Automatic Text Summarization Systems , 2015, DocEng.

[16]  Daraksha Parveen,et al.  Integrating Importance, Non-Redundancy and Coherence in Graph-Based Extractive Summarization , 2015, IJCAI.

[17]  Rafael Dueire Lins,et al.  Assessing Concept Weighting in Integer Linear Programming based Single-document Summarization , 2016, DocEng.