Word Graph-Based Multi-sentence Compression: Re-ranking Candidates Using Frequent Words

Multi-Sentence Compression is a task whose goal is to produce a short single sentence summary from a group of similar sentences. This paper presents a new re-ranking method based on frequent words extraction along with our modifications on a word graph-based MSC approach to reduce incorrect output. Compression candidates are re-ranked according to the number of frequent words they contain to select the most relevant output. Results of automatic evaluations performed in English and Vietnamese datasets show that the proposed method remarkably improves the generated compressions informativity.

[1]  Katja Filippova,et al.  Multi-Sentence Compression: Finding Shortest Paths in Word Graphs , 2010, COLING.

[2]  Mark T. Maybury,et al.  Automatic Summarization , 2002, Computational Linguistics.

[3]  Xiaojun Wan,et al.  CollabRank: Towards a Collaborative Approach to Single-Document Keyphrase Extraction , 2008, COLING.

[4]  Jackie Ck Cheung Comparing Abstractive and Extractive Summarization of Evaluative Text: Controversiality and Content Selection , 2008 .

[5]  Ron Artstein,et al.  Survey Article: Inter-Coder Agreement for Computational Linguistics , 2008, CL.

[6]  Regina Barzilay,et al.  Information Fusion in the Context of Multi-Document Summarization , 1999, ACL.

[7]  W. Bruce Croft,et al.  A general language model for information retrieval , 1999, CIKM '99.

[8]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[9]  Regina Barzilay,et al.  Sentence Fusion for Multidocument News Summarization , 2005, CL.

[10]  Michael Strube,et al.  Sentence Fusion via Dependency Graph Compression , 2008, EMNLP.

[11]  Shafiq R. Joty,et al.  Answering Complex Questions Using Query-Focused Summarization Technique , 2008, 2008 20th IEEE International Conference on Tools with Artificial Intelligence.

[12]  Micha Elsner,et al.  Learning to Fuse Disparate Sentences , 2011, Monolingual@ACL.

[13]  Sara Rosenthal,et al.  Time-Efficient Creation of an Accurate Sentence Fusion Corpus , 2010, HLT-NAACL.

[14]  Elena Lloret,et al.  Text summarisation in progress: a literature review , 2011, Artificial Intelligence Review.

[15]  Florian Boudin,et al.  Keyphrase Extraction for N-best Reranking in Multi-Sentence Compression , 2013, HLT-NAACL.

[16]  Xiaohua Hu,et al.  The Evaluation of Sentence Similarity Measures , 2008, DaWaK.

[17]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[18]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[19]  Marie-Francine Moens,et al.  Abstracting of legal cases: the potential of clustering based on the selection of representative objects , 1999 .

[20]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.