This paper presents Genetic Algorithm based sentence extraction strategy and threshold based document clustering algorithm to produce cluster wise optimal summary. Related documents are grouped into same cluster using threshold based document clustering algorithm. From each cluster important sentences are selected using feature profile which is generated by considering sentence specific features like word weight, sentence position, sentence length, sentence centrality, proper nouns in the sentence and numerical data in the sentence. Based on the feature profile sentence score is calculated for each sentence. To produce optimal summary fitness function is employed which is based on summary quality criteria like maximizing length, coverage and informativeness while minimizing the redundancy. Machine generated summaries are compared against human summaries using Precision, Recall, F-measure and ROUGE-1 measure. The experimental results shows that the proposed approach is efficient and outperforms than the existing multi-document summarization system based on genetic algorithm (MSBGA) approach.
[1]
Dong-Hong Ji,et al.
MSBGA: A Multi-Document Summarization System Based on Genetic Algorithm
,
2006,
2006 International Conference on Machine Learning and Cybernetics.
[2]
Mark T. Maybury,et al.
Automatic Summarization
,
2002,
Computational Linguistics.
[3]
Chin-Yew Lin,et al.
From Single to Multi-document Summarization : A Prototype System and its Evaluation
,
2002
.
[4]
Eduard H. Hovy,et al.
From Single to Multi-document Summarization
,
2002,
ACL.
[5]
Dragomir R. Radev,et al.
Sub-event based multi-document summarization
,
2003,
HLT-NAACL 2003.
[6]
Eduard Hovy,et al.
NeATS in DUC 2002
,
2002
.
[7]
Satoshi Sekine,et al.
A survey for Multi-Document Summarization
,
2003,
HLT-NAACL 2003.
[8]
Dragomir R. Radev,et al.
Centroid-based summarization of multiple documents
,
2004,
Inf. Process. Manag..
[9]
Regina Barzilay,et al.
Information Fusion in the Context of Multi-Document Summarization
,
1999,
ACL.