Microblog Contextualization: Advantages and Limitations of a Multi-sentence Compression Approach

The content analysis task of the MC2 CLEF 2017 lab aims to generate small summaries in four languages to contextualize microblogs. This paper analyzes the challenges of this task and also details the advantages and limitations of our approach using a cross-lingual compressive text summarization. We split this task in several subtasks in order to discuss their setup. In addition, we suggest an evaluation protocol to reduce the bias of the current metrics toward the approaches by extraction.

[1]  Xiaojun Wan,et al.  Using Bilingual Information for Cross-Language Document Summarization , 2011, ACL.

[2]  Juan-Manuel Torres-Moreno,et al.  Métodos de Otimização Combinatória Aplicados ao Problema de Compressão MultiFrases , 2017, ArXiv.

[3]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[4]  Stéphane Huet,et al.  Microblog Contextualization using Continuous Space Vectors: Multi-Sentence Compression of Cultural Documents , 2017, CLEF.

[5]  Juan-Manuel Torres-Moreno,et al.  Cross-Language Text Summarization Using Sentence and Multi-Sentence Compression , 2018, NLDB.

[6]  Juan-Manuel Torres-Moreno,et al.  Multi-Sentence Compression with Word Vertex-Labeled Graphs and Integer Linear Programming , 2018, TextGraphs@NAACL-HLT.

[7]  Juan-Manuel Torres-Moreno,et al.  Automatic Text Summarization , 2014 .

[8]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[9]  Philippe Mulhem,et al.  CLEF 2017 Microblog Cultural Contextualization Lab Overview , 2017, CLEF.

[10]  Yu Zhou,et al.  Abstractive Cross-Language Summarization via Translation Model Enhanced Predicate Argument Structure Fusing , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[11]  Juan-Manuel Torres-Moreno,et al.  Automatic Text Summarization: Torres-Moreno/Automatic Text Summarization , 2014 .

[12]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.