Clustering based optimal summary generation using Genetic Algorithm

This paper presents Genetic Algorithm based sentence extraction strategy and threshold based document clustering algorithm to produce cluster wise optimal summary. Related documents are grouped into same cluster using threshold based document clustering algorithm. From each cluster important sentences are selected using feature profile which is generated by considering sentence specific features like word weight, sentence position, sentence length, sentence centrality, proper nouns in the sentence and numerical data in the sentence. Based on the feature profile sentence score is calculated for each sentence. To produce optimal summary fitness function is employed which is based on summary quality criteria like maximizing length, coverage and informativeness while minimizing the redundancy. Machine generated summaries are compared against human summaries using Precision, Recall, F-measure and ROUGE-1 measure. The experimental results shows that the proposed approach is efficient and outperforms than the existing multi-document summarization system based on genetic algorithm (MSBGA) approach.