MCRMR: Maximum coverage and relevancy with minimal redundancy based multi-document summarization

Abstract In this paper, we propose a novel extraction based method for multi-document summarization that covers three important features of a good summary: coverage, non-redundancy, and relevancy. The coverage and non-redundancy features are modeled to generate a single document from the multiple documents. These features are explored by the weighted combination of word embedding and Google based similarity methods. To accommodate the relevancy feature in the system generated summaries, the text summarization task is modeled as an optimization problem, where various text features with their optimized weights are used to score the sentences to find the relevant sentences. For features’ weight optimization, we use the meta-heuristic approach, Shark Smell Optimization (SSO). The experiments are performed on six benchmark datasets (DUC04, DUC06, DUC07, TAC08, TAC11, and MultiLing13) with the co-selection and content based performance parameters. The experimental results show that the proposed approach is viable and effective for multi-document summarization.

[1]  Fuji Ren,et al.  GA, MR, FFNN, PNN and GMM based models for automatic text summarization , 2009, Comput. Speech Lang..

[2]  I. V. Ramakrishnan,et al.  Csurf: a context-driven non-visual web-browser , 2007, WWW '07.

[3]  ELENA BARALIS,et al.  MWI-Sum: A Multilingual Summarizer Based on Frequent Weighted Itemsets , 2015, TOIS.

[4]  M. de Rijke,et al.  Short Text Similarity with Word Embeddings , 2015, CIKM.

[5]  Philippe Blache,et al.  Minimum redundancy and maximum relevance for single and multi-document Arabic text summarization , 2014, J. King Saud Univ. Comput. Inf. Sci..

[6]  P. Balasubramanie,et al.  Clustering based optimal summary generation using Genetic Algorithm , 2010, 2010 International Conference on Communication and Computational Intelligence (INCOCCI).

[7]  Dilek Z. Hakkani-Tür,et al.  The ICSI Summarization System at TAC 2008 , 2008, TAC.

[8]  Hayato Kobayashi,et al.  Summarization Based on Embedding Distributions , 2015, EMNLP.

[9]  Jaime G. Carbonell,et al.  Exploring events and distributed representations of text in multi-document summarization , 2016, Knowl. Based Syst..

[10]  Bo Li,et al.  Adaptive Maximum Marginal Relevance Based Multi-email Summarization , 2009, AICI.

[11]  Diego R. Amancio,et al.  Extractive Multi-document Summarization Using Multilayer Networks , 2017, Physica A: Statistical Mechanics and its Applications.

[12]  Rasim M. Alguliyev,et al.  MCMR: Maximum coverage and minimum redundant text summarization model , 2011, Expert Syst. Appl..

[13]  Danai Koutra,et al.  Graph Summarization Methods and Applications , 2016, ACM Comput. Surv..

[14]  Jaime Carbonell,et al.  Multi-Document Summarization By Sentence Extraction , 2000 .

[15]  Matt J. Kusner,et al.  From Word Embeddings To Document Distances , 2015, ICML.

[16]  OVEIS ABEDINIA,et al.  A new metaheuristic algorithm based on shark smell optimization , 2016, Complex..

[17]  Michael D. Gordon Probabilistic and genetic algorithms in document retrieval , 1988, CACM.

[18]  George Giannakopoulos,et al.  Multi-document multilingual summarization and evaluation tracks in ACL 2013 MultiLing Workshop , 2013 .

[19]  Marc Moens,et al.  Articles Summarizing Scientific Articles: Experiments with Relevance and Rhetorical Status , 2002, CL.

[20]  Dong-Hong Ji,et al.  MSBGA: A Multi-Document Summarization System Based on Genetic Algorithm , 2006, 2006 International Conference on Machine Learning and Cybernetics.

[21]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[22]  Massih-Reza Amini,et al.  Automatic Text Summarization Based on Word-Clusters and Ranking Algorithms , 2005, ECIR.

[23]  Kong Joo Lee,et al.  Automatic Text Summarization Using Reinforcement Learning with Embedding Features , 2017, IJCNLP.

[24]  Jade Goldstein-Stewart,et al.  Summarization: (1) Using MMR for Diversity- Based Reranking and (2) Evaluating Summaries , 1998, TIPSTER.

[25]  Z. Li,et al.  Extracting and summarizing affective features and responses from online product descriptions and reviews: A Kansei text mining approach , 2018, Eng. Appl. Artif. Intell..

[26]  Yen-Liang Chen,et al.  Opinion mining from online hotel reviews - A text summarization approach , 2017, Inf. Process. Manag..

[27]  Mirella Lapata,et al.  Ranking Sentences for Extractive Summarization with Reinforcement Learning , 2018, NAACL.

[28]  Rasim M. Alguliyev,et al.  Multiple documents summarization based on evolutionary optimization algorithm , 2013, Expert Syst. Appl..

[29]  Jan Snajder,et al.  Event graphs for information retrieval and multi-document summarization , 2014, Expert Syst. Appl..

[30]  Leila Sharif Hassanabadi,et al.  Text summarization with harmony search algorithm-based sentence extraction , 2008, CSTST.

[31]  Oren Etzioni,et al.  Towards Coherent Multi-Document Summarization , 2013, NAACL.

[32]  Elena Lloret,et al.  Application of Text Summarization techniques to the Geographical Information Retrieval task , 2013, Expert Syst. Appl..

[33]  Hannu Toivonen,et al.  Document summarization based on word associations , 2014, SIGIR.

[34]  Jun Zhang,et al.  Adaptive Particle Swarm Optimization , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[35]  Xiaojun Wan,et al.  Towards a Unified Approach to Simultaneous Single-Document and Multi-Document Summarizations , 2010, COLING.

[36]  Elena Lloret,et al.  Towards automatic tweet generation: A comparative study from the text summarization perspective in the journalism genre , 2013, Expert Syst. Appl..

[37]  Naomie Salim,et al.  A framework for multi-document abstractive summarization based on semantic role labelling , 2015, Appl. Soft Comput..

[38]  Hassan Khotanlou,et al.  Fuzzy evolutionary cellular learning automata model for text summarization , 2016, Swarm Evol. Comput..

[39]  Naomie Salim,et al.  Fuzzy swarm diversity hybrid model for text summarization , 2010, Inf. Process. Manag..

[40]  Behrooz Masoumi,et al.  Automatic text summarization based on multi-agent particle swarm optimization , 2014, 2014 Iranian Conference on Intelligent Systems (ICIS).

[41]  Rakesh Chandra Balabantaray,et al.  An evolutionary framework for multi document summarization using Cuckoo search approach: MDSCSA , 2018 .

[42]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[43]  Maria Bardosova,et al.  Using network science and text analytics to produce surveys in a scientific topic , 2015, J. Informetrics.

[44]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[45]  Dianne P. O'Leary,et al.  CLASSY 2011 at TAC: Guided and Multi-lingual Summaries and Evaluation Metrics , 2011, TAC.

[46]  Diego R. Amancio,et al.  Probing the Topological Properties of Complex Networks Modeling Short Written Texts , 2014, PloS one.

[47]  Jimmy J. Lin,et al.  Single-document and multi-document summarization techniques for email threads using sentence compression , 2008, Inf. Process. Manag..

[48]  Luciano da Fontoura Costa,et al.  Extractive summarization using complex networks and syntactic dependency , 2012 .

[49]  M. Wilscy,et al.  Extractive multi-document summarization using population-based multicriteria optimization , 2017, Expert Syst. Appl..

[50]  Daraksha Parveen,et al.  Generating Coherent Summaries of Scientific Articles Using Coherence Patterns , 2016, EMNLP.

[51]  George Giannakopoulos,et al.  AutoSummENG and MeMoG in Evaluating Guided Summaries , 2011, TAC.

[52]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[53]  Rakesh Chandra Balabantaray,et al.  Cat swarm optimization based evolutionary framework for multi document summarization , 2017 .