An unsupervised approach to generating generic summaries of documents

We model document summarization as a quadratic Boolean programming problem.We create a modified differential evolution to solve the optimization problem.Experimental study shows that the model improves the summarization results. We present an optimization-based unsupervised approach to automatic document summarization. In the proposed approach, text summarization is modeled as a Boolean programming problem. This model generally attempts to optimize three properties, namely, (1) relevance: summary should contain informative textual units that are relevant to the user; (2) redundancy: summaries should not contain multiple textual units that convey the same information; and (3) length: summary is bounded in length. The approach proposed in this paper is applicable to both tasks: single- and multi-document summarization. In both tasks, documents are split into sentences in preprocessing. We select some salient sentences from document(s) to generate a summary. Finally, the summary is generated by threading all the selected sentences in the order that they appear in the original document(s). We implemented our model on multi-document summarization task. When comparing our methods to several existing summarization methods on an open DUC2005 and DUC2007 data sets, we found that our method improves the summarization results significantly. This is because, first, when extracting summary sentences, this method not only focuses on the relevance scores of sentences to the whole sentence collection, but also the topic representative of sentences. Second, when generating a summary, this method also deals with the problem of repetition of information. The methods were evaluated using ROUGE-1, ROUGE-2 and ROUGE-SU4 metrics. In this paper, we also demonstrate that the summarization result depends on the similarity measure. Results of the experiment showed that combination of symmetric and asymmetric similarity measures yields better result than their use separately.

[1]  Ryan T. McDonald A Study of Global Inference Algorithms in Multi-document Summarization , 2007, ECIR.

[2]  Fuji Ren,et al.  GA, MR, FFNN, PNN and GMM based models for automatic text summarization , 2009, Comput. Speech Lang..

[3]  Hua Li,et al.  Document Summarization Using Conditional Random Fields , 2007, IJCAI.

[4]  Xiaojun Wan Using only cross-document relationships for both generic and topic-focused multi-document summarizations , 2007, Information Retrieval.

[5]  Anna Kazantseva,et al.  Summarizing Short Stories , 2010, CL.

[6]  M. M. Ali Differential evolution with generalized differentials , 2011, J. Comput. Appl. Math..

[7]  Rasim M. Alguliyev,et al.  Sentence selection for generic document summarization using an adaptive differential evolution algorithm , 2011, Swarm Evol. Comput..

[8]  Zongkai Yang,et al.  The Automated Estimation of Content-Terms for Query-Focused Multi-document Summarization , 2008, 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery.

[9]  Furu Wei,et al.  Query-sensitive mutual reinforcement chain and its application in query-oriented multi-document summarization , 2008, SIGIR '08.

[10]  Jin Zhang,et al.  GSPSummary: A Graph-Based Sub-topic Partition Algorithm for Summarization , 2008, AIRS.

[11]  Karen Spärck Jones Automatic summarising: The state of the art , 2007, Inf. Process. Manag..

[12]  Rasim M. Alguliyev,et al.  GenDocSum + MCLR: Generic document summarization based on maximum coverage and less redundancy , 2012, Expert Syst. Appl..

[13]  Hiroya Takamura,et al.  Text Summarization Model Based on Maximum Coverage Problem and its Variant , 2009, EACL.

[14]  Ilyas Cicekli,et al.  Generic text summarization for Turkish , 2009, 2009 24th International Symposium on Computer and Information Sciences.

[15]  Ramiz M. Aliguliyev,et al.  CLUSTERING TECHNIQUES AND DISCRETE PARTICLE SWARM OPTIMIZATION ALGORITHM FOR MULTI‐DOCUMENT SUMMARIZATION , 2010, Comput. Intell..

[16]  Xiaolei Wang,et al.  Personalized PageRank Based Multi-document Summarization , 2008, IEEE International Workshop on Semantic Computing and Systems.

[17]  Leonhard Hennig,et al.  Topic-based Multi-Document Summarization with Probabilistic Latent Semantic Analysis , 2009, RANLP.

[18]  Yuji Matsumoto,et al.  The diversity-based approach to open-domain text summarization , 2003, Inf. Process. Manag..

[19]  Wai Lam,et al.  Towards More Effective Text Summarization Based on Textual Association Networks , 2008, 2008 Fourth International Conference on Semantics, Knowledge and Grid.

[20]  Rasim M. Alguliyev,et al.  MCMR: Maximum coverage and minimum redundant text summarization model , 2011, Expert Syst. Appl..

[21]  Ramiz M. Aliguliyev A Novel Partitioning-Based Clustering Method and Generic Document Summarization , 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology Workshops.

[22]  Andries Petrus Engelbrecht,et al.  Binary Differential Evolution , 2006, 2006 IEEE International Conference on Evolutionary Computation.

[23]  Jin Zhang,et al.  AdaSum: an adaptive model for summarization , 2008, CIKM '08.

[24]  Furu Wei,et al.  PNR2: Ranking Sentences with Positive and Negative Reinforcement for Query-Oriented Update Summarization , 2008, COLING.

[25]  R. Storn,et al.  Differential Evolution: A Practical Approach to Global Optimization (Natural Computing Series) , 2005 .

[26]  Sadid A. Hasan,et al.  Query-focused multi-document summarization: automatic data annotations and supervised learning approaches , 2011, Natural Language Engineering.

[27]  Rasim M. Alguliyev,et al.  Multiple documents summarization based on evolutionary optimization algorithm , 2013, Expert Syst. Appl..

[28]  Christopher C. Yang,et al.  Hierarchical summarization of large documents , 2008 .

[29]  Mark T. Maybury,et al.  Advances in Automatic Text Summarization , 1999 .

[30]  Flora S. Tsai,et al.  Evaluation of novelty metrics for sentence-level novelty mining , 2010, Inf. Sci..

[31]  Rasim M. Alguliyev,et al.  AN OPTIMIZATION APPROACH TO AUTOMATIC GENERIC DOCUMENT SUMMARIZATION , 2013, Comput. Intell..

[32]  Jade Goldstein-Stewart,et al.  The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries , 1998, SIGIR Forum.

[33]  P. N. Suganthan,et al.  Differential Evolution Algorithm With Strategy Adaptation for Global Numerical Optimization , 2009, IEEE Transactions on Evolutionary Computation.

[34]  Naomie Salim,et al.  MMI diversity based text summarization , 2009 .

[35]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[36]  Yihong Gong,et al.  Integrating Document Clustering and Multidocument Summarization , 2011, TKDD.

[37]  Sun Park,et al.  Automatic generic document summarization based on non-negative matrix factorization , 2009, Inf. Process. Manag..

[38]  Rasim M. Alguliyev,et al.  CDDS: Constraint-driven document summarization models , 2013, Expert Syst. Appl..

[39]  Dilek Z. Hakkani-Tür,et al.  A Hybrid Hierarchical Model for Multi-Document Summarization , 2010, ACL.

[40]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[41]  Douglas A. Wolfe,et al.  Nonparametric Statistical Methods , 1973 .

[42]  Qin Lu,et al.  Applying regression models to query-focused multi-document summarization , 2011, Inf. Process. Manag..

[43]  Jie Tang,et al.  Multi-topic Based Query-Oriented Summarization , 2009, SDM.

[44]  Xuanjing Huang,et al.  Using query expansion in graph-based approach for query-focused multi-document summarization , 2009, Inf. Process. Manag..

[45]  Hiroya Takamura,et al.  Text summarization model based on the budgeted median problem , 2009, CIKM.

[46]  Mehmet Fatih Tasgetiren,et al.  Differential evolution algorithm with ensemble of parameters and mutation strategies , 2011, Appl. Soft Comput..

[47]  Qin Lu,et al.  Intertopic information mining for query-based summarization , 2010 .

[48]  Massih-Reza Amini,et al.  Incorporating prior knowledge into a transductive ranking algorithm for multi-document summarization , 2009, SIGIR.

[49]  P. N. Suganthan,et al.  Differential Evolution: A Survey of the State-of-the-Art , 2011, IEEE Transactions on Evolutionary Computation.

[50]  Rasim M. Alguliyev,et al.  DESAMC+DocSum: Differential evolution with self-adaptive mutation and crossover parameters for multi-document summarization , 2012, Knowl. Based Syst..

[51]  Furu Wei,et al.  iRANK: A rank-learn-combine framework for unsupervised ensemble ranking , 2010 .

[52]  Yihong Gong,et al.  Multi-Document Summarization using Sentence-based Topic Models , 2009, ACL.

[53]  Rasim M. Alguliev,et al.  Automatic Text Documents Summarization through Sentences Clustering , 2008 .

[54]  Vasileios Hatzivassiloglou,et al.  A Formal Model for Information Selection in Multi-Sentence Text Extraction , 2004, COLING.

[55]  Dragomir R. Radev,et al.  Biased LexRank: Passage retrieval using random walks with question-based priors , 2009, Inf. Process. Manag..

[56]  Shafiq R. Joty,et al.  A SVM-Based Ensemble Approach to Multi-Document Summarization , 2009, Canadian Conference on AI.

[57]  Ting Liu,et al.  A novel approach to update summarization using evolutionary manifold-ranking and spectral clustering , 2012, Expert Syst. Appl..

[58]  Ramiz M. Aliguliyev,et al.  A new sentence similarity measure and sentence based extractive technique for automatic text summarization , 2009, Expert Syst. Appl..

[59]  Rainer Storn,et al.  Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces , 1997, J. Glob. Optim..

[60]  William B. Frakes,et al.  Stemming Algorithms , 1992, Information Retrieval: Data Structures & Algorithms.

[61]  Rasim M. Alguliyev,et al.  Evolutionary Algorithm for Extractive Text Summarization , 2009, Intell. Inf. Manag..

[62]  Wenjie Li,et al.  A spectral analysis approach to document summarization: Clustering and ranking sentences simultaneously , 2011, Inf. Sci..