Language-independent multi-document text summarization with document-specific word associations

The goal of automatic text summarization is to generate an abstract of a document or a set of documents. In this paper we propose a word association based method for generating summaries in a variety of languages. We show that a robust statistical method for finding associations which are specific to the given document(s) is applicable to many languages. We introduce strategies that utilize the discovered associations to effectively select sentences from the document(s) to constitute the summary. Empirical results indicate that the method works reliably in a relatively large set of languages and outperforms methods reported in MultiLing 2013.

[1]  George Giannakopoulos,et al.  AutoSummENG and MeMoG in Evaluating Guided Summaries , 2011, TAC.

[2]  David S. Johnson,et al.  Approximation algorithms for combinatorial problems , 1973, STOC.

[3]  ELENA BARALIS,et al.  MWI-Sum: A Multilingual Summarizer Based on Frequent Weighted Itemsets , 2015, TOIS.

[4]  Dragomir R. Radev,et al.  Generating summaries of multiple news articles , 1995, SIGIR '95.

[5]  L. Freeman Centrality in social networks conceptual clarification , 1978 .

[6]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[7]  Chun Chen,et al.  Document Summarization Based on Data Reconstruction , 2012, AAAI.

[8]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[9]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[10]  Dilek Z. Hakkani-Tür,et al.  Discovery of Topically Coherent Sentences for Extractive Summarization , 2011, ACL.

[11]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[12]  Luca Cagliero,et al.  Learning From Summaries: Supporting e-Learning Activities by Means of Document Summarization , 2016, IEEE Transactions on Emerging Topics in Computing.

[13]  He Liu,et al.  Multi-Document Summarization Based on Two-Level Sparse Representation Model , 2015, AAAI.

[14]  J. Beasley,et al.  A genetic algorithm for the set covering problem , 1996 .

[15]  Josef Steinberger,et al.  The UWB Summariser at Multiling-2013 , 2013 .

[16]  Hannu Toivonen,et al.  Document summarization based on word associations , 2014, SIGIR.

[17]  George Giannakopoulos,et al.  Multi-document multilingual summarization and evaluation tracks in ACL 2013 MultiLing Workshop , 2013 .

[18]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[19]  Hans-Paul Schwefel,et al.  Numerical Optimization of Computer Models , 1982 .

[20]  Marc Parizeau,et al.  DEAP: evolutionary algorithms made easy , 2012, J. Mach. Learn. Res..

[21]  Xin Liu,et al.  Generic text summarization using relevance measure and latent semantic analysis , 2001, SIGIR '01.

[22]  Barbara Plank,et al.  Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies , 2011 .

[23]  Ani Nenkova,et al.  A Survey of Text Summarization Techniques , 2012, Mining Text Data.

[24]  Zhiming Zhang,et al.  TopicDSDR: Combining Topic Decomposition and Data Reconstruction for Summarization , 2013, WAIM.