Multi-document Summarization Based on Cluster Using Non-negative Matrix Factorization

In this paper, a new summarization method, which uses non-negative matrix factorization (NMF) and K-means clustering, is introduced to extract meaningful sentences from multi-documents. The proposed method can improve the quality of document summaries because the inherent semantics of the documents are well reflected by using the semantic features calculated by NMF and the sentences most relevant to the given topic are extracted efficiently by using the semantic variables derived by NMF. Besides, it uses K-means clustering to remove noises so that it can avoid the biased inherent semantics of the documents to be reflected in summaries. We perform detail experiments with the well-known DUC test dataset. The experimental results demonstrate that the proposed method has better performance than other methods using the LSA, the Kmeans, and the NMF.

[1]  Xin Liu,et al.  Generic text summarization using relevance measure and latent semantic analysis , 2001, SIGIR '01.

[2]  Akira Utsumi,et al.  Query-Based Multidocument Summarization for Information Retrieval , 2004, NTCIR.

[3]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[4]  Vagelis Hristidis,et al.  Structure-based query-specific document summarization , 2005, CIKM '05.

[5]  Lakhmi C. Jain,et al.  Knowledge-Based Intelligent Information and Engineering Systems , 2004, Lecture Notes in Computer Science.

[6]  Jade Goldstein-Stewart,et al.  Creating and evaluating multi-document sentence extract summaries , 2000, CIKM '00.

[7]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[8]  David Reitter,et al.  The Embra System at DUC 2005: Query-oriented Multi-document Summarization with a Very Large Latent Semantic Space , 2005 .

[9]  Horacio Saggion,et al.  Topic-based Summarization at DUC 2005 , 2005 .

[10]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[11]  Dragomir R. Radev,et al.  Introduction to the Special Issue on Summarization , 2002, CL.

[12]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[13]  Sanda M. Harabagiu,et al.  Topic themes for multi-document summarization , 2005, SIGIR '05.

[14]  Hoa Trang Dang,et al.  Overview of DUC 2005 , 2005 .

[15]  Sun Park,et al.  Query Based Summarization Using Non-negative Matrix Factorization , 2006, KES.

[16]  Jihoon Yang,et al.  Extracting sentence segments for text summarization: a machine learning approach , 2000, SIGIR '00.