论文信息 - Multi-Sentence Compression with Word Vertex-Labeled Graphs and Integer Linear Programming - 字舞流文

Multi-Sentence Compression with Word Vertex-Labeled Graphs and Integer Linear Programming

Multi-Sentence Compression (MSC) aims to generate a short sentence with key information from a cluster of closely related sentences. MSC enables summarization and question-answering systems to generate outputs combining fully formed sentences from one or several documents. This paper describes a new Integer Linear Programming method for MSC using a vertex-labeled graph to select different keywords, and novel 3-grams scores to generate more informative sentences while maintaining their grammaticality. Our system is of good quality and outperforms the state-of-the-art for evaluations led on news dataset. We led both automatic and manual evaluations to determine the informativeness and the gram-maticality of compressions for each dataset. Additional tests, which take advantage of the fact that the length of compressions can be modulated, still improve ROUGE scores with shorter output sentences.

Juan-Manuel Torres-Moreno | Stéphane Huet | Elvys Linhares Pontes | Andréa Carneiro Linhares | Thiago Gouveia da Silva | Juan-Manuel Torres-Moreno | Stéphane Huet | A. Linhares | E. L. Pontes | T. F. D. Silva

[1] Lukasz Kaiser,et al. Sentence Compression by Deletion with LSTMs , 2015, EMNLP.

[2] Jason Weston,et al. A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.

[3] Kathleen McKeown,et al. Supervised Sentence Fusion with Single-Stage Inference , 2013, IJCNLP.

[4] Rada Mihalcea,et al. TextRank: Bringing Order into Text , 2004, EMNLP.

[5] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[6] Temel Öncan,et al. A comparative analysis of several asymmetric traveling salesman problem formulations , 2009, Comput. Oper. Res..

[7] Fang Chen,et al. An Efficient Approach for Multi-Sentence Compression , 2016, ACML.

[8] Christian Komusiewicz,et al. Evaluation of ILP-Based Approaches for Partitioning into Colorful Components , 2013, SEA.

[9] David Sankoff,et al. OMG! Orthologs in Multiple Genomes - Competing Graph-Theoretical Formulations , 2011, WABI.

[10] J. Clarke,et al. Global inference for sentence compression : an integer linear programming approach , 2008, J. Artif. Intell. Res..

[11] Mirella Lapata,et al. Modelling Compression with Discourse Constraints , 2007, EMNLP.

[12] Minh-Quoc Nghiem,et al. Word Graph-Based Multi-sentence Compression: Re-ranking Candidates Using Frequent Words , 2015, 2015 Seventh International Conference on Knowledge and Systems Engineering (KSE).

[13] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[14] Ulf Brefeld,et al. Learning to Summarise Related Sentences , 2014, COLING.

[15] Richard A. Harshman,et al. Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[16] Florian Boudin,et al. Keyphrase Extraction for N-best Reranking in Multi-Sentence Compression , 2013, HLT-NAACL.

[17] Sara Rosenthal,et al. Time-Efficient Creation of an Accurate Sentence Fusion Corpus , 2010, HLT-NAACL.

[18] Phil Blunsom,et al. Language as a Latent Variable: Discrete Generative Models for Sentence Compression , 2016, EMNLP.

[19] Regina Barzilay,et al. Sentence Fusion for Multidocument News Summarization , 2005, CL.

[20] Michael Strube,et al. Sentence Fusion via Dependency Graph Compression , 2008, EMNLP.

[21] Prasenjit Mitra,et al. Multi-Document Abstractive Summarization Using ILP Based Multi-Sentence Compression , 2015, IJCAI.

[22] Chris Callison-Burch,et al. Evaluating Sentence Compression: Pitfalls and Suggested Remedies , 2011, Monolingual@ACL.

[23] Katja Filippova,et al. Multi-Sentence Compression: Finding Shortest Paths in Word Graphs , 2010, COLING.