Multi-Sentence Compression with Word Vertex-Labeled Graphs and Integer Linear Programming

Multi-Sentence Compression (MSC) aims to generate a short sentence with key information from a cluster of closely related sentences. MSC enables summarization and question-answering systems to generate outputs combining fully formed sentences from one or several documents. This paper describes a new Integer Linear Programming method for MSC using a vertex-labeled graph to select different keywords, and novel 3-grams scores to generate more informative sentences while maintaining their grammaticality. Our system is of good quality and outperforms the state-of-the-art for evaluations led on news dataset. We led both automatic and manual evaluations to determine the informativeness and the gram-maticality of compressions for each dataset. Additional tests, which take advantage of the fact that the length of compressions can be modulated, still improve ROUGE scores with shorter output sentences.

[1]  Lukasz Kaiser,et al.  Sentence Compression by Deletion with LSTMs , 2015, EMNLP.

[2]  Jason Weston,et al.  A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.

[3]  Kathleen McKeown,et al.  Supervised Sentence Fusion with Single-Stage Inference , 2013, IJCNLP.

[4]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[5]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[6]  Temel Öncan,et al.  A comparative analysis of several asymmetric traveling salesman problem formulations , 2009, Comput. Oper. Res..

[7]  Fang Chen,et al.  An Efficient Approach for Multi-Sentence Compression , 2016, ACML.

[8]  Christian Komusiewicz,et al.  Evaluation of ILP-Based Approaches for Partitioning into Colorful Components , 2013, SEA.

[9]  David Sankoff,et al.  OMG! Orthologs in Multiple Genomes - Competing Graph-Theoretical Formulations , 2011, WABI.

[10]  J. Clarke,et al.  Global inference for sentence compression : an integer linear programming approach , 2008, J. Artif. Intell. Res..

[11]  Mirella Lapata,et al.  Modelling Compression with Discourse Constraints , 2007, EMNLP.

[12]  Minh-Quoc Nghiem,et al.  Word Graph-Based Multi-sentence Compression: Re-ranking Candidates Using Frequent Words , 2015, 2015 Seventh International Conference on Knowledge and Systems Engineering (KSE).

[13]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[14]  Ulf Brefeld,et al.  Learning to Summarise Related Sentences , 2014, COLING.

[15]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[16]  Florian Boudin,et al.  Keyphrase Extraction for N-best Reranking in Multi-Sentence Compression , 2013, HLT-NAACL.

[17]  Sara Rosenthal,et al.  Time-Efficient Creation of an Accurate Sentence Fusion Corpus , 2010, HLT-NAACL.

[18]  Phil Blunsom,et al.  Language as a Latent Variable: Discrete Generative Models for Sentence Compression , 2016, EMNLP.

[19]  Regina Barzilay,et al.  Sentence Fusion for Multidocument News Summarization , 2005, CL.

[20]  Michael Strube,et al.  Sentence Fusion via Dependency Graph Compression , 2008, EMNLP.

[21]  Prasenjit Mitra,et al.  Multi-Document Abstractive Summarization Using ILP Based Multi-Sentence Compression , 2015, IJCAI.

[22]  Chris Callison-Burch,et al.  Evaluating Sentence Compression: Pitfalls and Suggested Remedies , 2011, Monolingual@ACL.

[23]  Katja Filippova,et al.  Multi-Sentence Compression: Finding Shortest Paths in Word Graphs , 2010, COLING.