Multi-document Summarization Based on Sentence Clustering

A main task of multi-document summarization is sentence selection. However, many of the existing approaches only select top ranked sentences without redundancy detection. In addition, some summarization approaches generate summaries with low redundancy but they are supervised. To address these issues, we propose a novel method named Redundancy Detection-based Multi-document Summarizer (RDMS). The proposed method first generates an informative sentence set, then applies sentence clustering to detect redundancy. After sentence clustering, we conduct cluster ranking, candidate selection, and representative selection to eliminate redundancy. RDMS is an unsupervised multi-document summarization system and the experimental results on DUC 2004 and DUC 2005 datasets indicate that the performance of RDMS is better than unsupervised systems and supervised systems in terms of ROUGE-1, ROUGE-L and ROUGE-SU.