Compressive approaches for cross-language multi-document summarization

Abstract The popularization of social networks and digital documents has quickly increased the multilingual information available on the Internet. However, this huge amount of data cannot be analyzed manually. This paper deals with Cross-Language Text Summarization (CLTS) that produces a summary in a different language from the source documents. We describe three compressive CLTS approaches that analyze the text in the source and target languages to compute the relevance of sentences. Our systems compress sentences at two levels: clusters of similar sentences are compressed using a multi-sentence compression (MSC) method and single sentences are compressed using a Neural Network model. The version of our approach using multi-sentence compression generated more informative French-to-English cross-lingual summaries than extractive state-of-the-art systems. Moreover, these cross-lingual summaries have a grammatical quality similar to extractive approaches.

[1]  Xiaojun Wan,et al.  Cross-language document summarization via extraction and ranking of multiple summaries , 2018, Knowledge and Information Systems.

[2]  Philipp Koehn,et al.  Findings of the 2017 Conference on Machine Translation (WMT17) , 2017, WMT.

[3]  Juan-Manuel Torres-Moreno,et al.  Cross-Language Text Summarization Using Sentence and Multi-Sentence Compression , 2018, NLDB.

[4]  Juan-Manuel Torres-Moreno,et al.  Multi-Sentence Compression with Word Vertex-Labeled Graphs and Integer Linear Programming , 2018, TextGraphs@NAACL-HLT.

[5]  Juan-Manuel Torres-Moreno,et al.  Automatic Text Summarization , 2014 .

[6]  Yang Liu,et al.  Fast Joint Compression and Summarization via Graph Cuts , 2013, EMNLP.

[7]  Lukasz Kaiser,et al.  Sentence Compression by Deletion with LSTMs , 2015, EMNLP.

[8]  Yu Zhou,et al.  Abstractive Cross-Language Summarization via Translation Model Enhanced Predicate Argument Structure Fusing , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[9]  Fei Liu,et al.  Document Summarization via Guided Sentence Compression , 2013, EMNLP.

[10]  Xiaojun Wan,et al.  Cross-Language Document Summarization Based on Machine Translation Quality Prediction , 2010, ACL.

[11]  Kathleen McKeown,et al.  Lexicalized Markov Grammars for Sentence Compression , 2007, NAACL.

[12]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[13]  Mark A. Finlayson,et al.  jMWE: A Java Toolkit for Detecting Multi-Word Expressions , 2011, MWE@ACL.

[14]  Xiaojun Wan,et al.  Using Bilingual Information for Cross-Language Document Summarization , 2011, ACL.

[15]  Katja Filippova,et al.  Multi-Sentence Compression: Finding Shortest Paths in Word Graphs , 2010, COLING.

[16]  Yang Zhao,et al.  A comprehensive study: Sentence compression with linguistic knowledge-enhanced gated neural network , 2018, Data Knowl. Eng..

[17]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[18]  Xiaojun Wan,et al.  Compressive Document Summarization via Sparse Optimization , 2015, IJCAI.

[19]  Constantin Orasan,et al.  Evaluation of a Cross-lingual Romanian-English Multi-document Summariser , 2008, LREC.

[20]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[21]  Minh-Quoc Nghiem,et al.  Effective attention-based neural architectures for sentence compression with bidirectional long short-term memory , 2016, SoICT.

[22]  Xiaojun Wan,et al.  Phrase-based Compressive Cross-Language Summarization , 2015, EMNLP.

[23]  Lin Zhao,et al.  Improving Multi-documents Summarization by Sentence Compression based on Expanded Constituent Parse Trees , 2014, EMNLP.

[24]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[25]  Juan-Manuel Torres-Moreno,et al.  A New Annotated Portuguese/Spanish Corpus for the Multi-Sentence Compression Task , 2018, LREC.

[26]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[27]  Mohammed Atiquzzaman,et al.  Multi-document abstractive summarization using chunk-graph and recurrent neural network , 2017, 2017 IEEE International Conference on Communications (ICC).

[28]  Florian Boudin,et al.  A Graph-based Approach to Cross-language Multi-document Summarization , 2011, Polibits.

[29]  Anton Leuski,et al.  Cross-lingual C*ST*RD: English access to Hindi information , 2003, TALIP.