Evolutionary Algorithm for Extractive Text Summarization

Text summarization is the process of automatically creating a compressed version of a given document preserving its information content. There are two types of summarization: extractive and abstractive. Extractive summarization methods simplify the problem of summarization into the problem of selecting a representative subset of the sentences in the original documents. Abstractive summarization may compose novel sentences, unseen in the original sources. In our study we focus on sentence based extractive document summarization. The extractive summarization systems are typically based on techniques for sentence extraction and aim to cover the set of sentences that are most important for the overall understanding of a given document. In this paper, we propose unsupervised document summarization method that creates the summary by clustering and extracting sentences from the original document. For this purpose new criterion functions for sentence clustering have been proposed. Similarity measures play an increasingly important role in document clustering. Here we’ve also developed a discrete differential evolution algorithm to optimize the criterion functions. The experimental results show that our suggested approach can improve the performance compared to sate-of-the-art summarization approaches.

[1]  Xiaojun Wan,et al.  A novel document similarity measure based on earth mover's distance , 2007, Inf. Sci..

[2]  Fuji Ren,et al.  GA, MR, FFNN, PNN and GMM based models for automatic text summarization , 2009, Comput. Speech Lang..

[3]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[4]  Ramiz M. Aliguliyev,et al.  A new sentence similarity measure and sentence based extractive technique for automatic text summarization , 2009, Expert Syst. Appl..

[5]  Marti A. Hearst,et al.  HLT-NAACL 2003 : Human Language Technology conference of the North American Chapter of the Association for Computational Linguistics : proceedings of the main conference : May 27 to June 1, 2003, Edmonton, Alberta, Canada , 2003 .

[6]  Inderjeet Mani,et al.  The Challenges of Automatic Summarization , 2000, Computer.

[7]  Mohamed S. Kamel,et al.  Efficient phrase-based document indexing for Web document clustering , 2004, IEEE Transactions on Knowledge and Data Engineering.

[8]  Gerard Salton,et al.  Automatic Text Structuring and Summarization , 1997, Inf. Process. Manag..

[9]  Rada Mihalcea,et al.  A Language Independent Algorithm for Single and Multiple Document Summarization , 2005, IJCNLP.

[10]  Danushka Bollegala,et al.  Measuring semantic similarity between words using web search engines , 2007, WWW '07.

[11]  Swagatam Das,et al.  Automatic Clustering Using an Improved Differential Evolution Algorithm , 2007 .

[12]  Zuhair Bandar,et al.  Sentence similarity based on semantic nets and corpus statistics , 2006, IEEE Transactions on Knowledge and Data Engineering.

[13]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[14]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents , 2004, Inf. Process. Manag..

[15]  Hsinchun Chen,et al.  Summary in context: Searching versus browsing , 2006, TOIS.

[16]  Brian Roark,et al.  Query-focused summarization by supervised sentence ranking and skewed word distributions , 2006 .

[17]  Ramiz M. Aliguliyev A Novel Partitioning-Based Clustering Method and Generic Document Summarization , 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology Workshops.

[18]  Thorsten von Eicken,et al.  技術解説 IEEE Computer , 1999 .

[19]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[20]  Rasim M. Alguliyev,et al.  Effective summarization method of text documents , 2005, The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05).

[21]  Hua Li,et al.  Document Summarization Using Conditional Random Fields , 2007, IJCAI.

[22]  Pascale Fung,et al.  One story, one flow: Hidden Markov Story Models for multilingual multidocument summarization , 2006, TSLP.

[23]  Andreas Rudolph,et al.  Techniques of Cluster Algorithms in Data Mining , 2002, Data Mining and Knowledge Discovery.

[24]  Xiaojun Wan Using only cross-document relationships for both generic and topic-focused multi-document summarizations , 2007, Information Retrieval.

[25]  Jimmy J. Lin,et al.  Multi-candidate reduction: Sentence compression as a tool for document summarization tasks , 2007, Inf. Process. Manag..

[26]  Soon Myoung Chung,et al.  Text Clustering with Feature Selection by Using Statistical Data , 2008, IEEE Transactions on Knowledge and Data Engineering.

[27]  Dianne P. O'Leary,et al.  QCS: A Tool for Querying, Clustering, and Summarizing Documents , 2003, HLT-NAACL.

[28]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[29]  Dragomir R. Radev,et al.  Introduction to the Special Issue on Summarization , 2002, CL.

[30]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[31]  Rada Mihalcea,et al.  Explorations in Automatic Book Summarization , 2007, EMNLP.

[32]  Rasim M. Alguliev,et al.  Automatic Text Documents Summarization through Sentences Clustering , 2008 .

[33]  Rainer Storn,et al.  Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces , 1997, J. Glob. Optim..

[34]  Lucy Vanderwende,et al.  Enhancing Single-Document Summarization by Combining RankNet and Third-Party Sources , 2007, EMNLP.

[35]  Xiaojun Wan,et al.  Manifold-Ranking Based Topic-Focused Multi-Document Summarization , 2007, IJCAI.

[36]  Yi Guo,et al.  An intelligent summarization system based on cognitive psychology , 2005, Inf. Sci..

[37]  Tat-Seng Chua,et al.  Document concept lattice for text understanding and summarization , 2007, Inf. Process. Manag..

[38]  Xiaoying Liu,et al.  Sentence Similarity based on Dynamic Time Warping , 2007, International Conference on Semantic Computing (ICSC 2007).

[39]  Manabu Okumura,et al.  An Automatic Method for Summary Evaluation Using Multiple Evaluation Results by a Manual Method , 2006, ACL.

[40]  Wei-Pang Yang,et al.  Text summarization using a trainable summarizer and latent semantic analysis , 2005, Inf. Process. Manag..

[41]  George Karypis,et al.  Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering , 2004, Machine Learning.

[42]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[43]  Ramiz M. Aliguliyev,et al.  CLUSTERING TECHNIQUES AND DISCRETE PARTICLE SWARM OPTIMIZATION ALGORITHM FOR MULTI‐DOCUMENT SUMMARIZATION , 2010, Comput. Intell..

[44]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[45]  Tao Li,et al.  A Unified View on Clustering Binary Data , 2006, Machine Learning.

[46]  Dianne P. O'Leary,et al.  QCS: A system for querying, clustering and summarizing documents , 2007, Inf. Process. Manag..

[47]  Marcello Pelillo,et al.  Dominant Sets and Pairwise Clustering , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Xin Liu,et al.  Generic text summarization using relevance measure and latent semantic analysis , 2001, SIGIR '01.