Fuzzy Clustering for Topic Analysis and Summarization of Document Collections

Large document collections, such as those delivered by Internet search engines, are difficult and time-consuming for users to read and analyse. The detection of common and distinctive topics within a document set, together with the generation of multi-document summaries, can greatly ease the burden of information management. We show how this can be achieved with a clustering algorithm based on fuzzy set theory, which (i) is easy to implement and integrate into a personal information system, (ii) generates a highly flexible data structure for topic analysis and summarization, and (iii) also delivers excellent performance.

[1]  Ralf Krestel,et al.  Engineering a Semantic Desktop for Building Historians and Architects , 2005, Semantic Desktop Workshop.

[2]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents , 2004, Inf. Process. Manag..

[3]  Mitsuru Ishizuka,et al.  Change Summarization in Web Collections , 2004, IEA/AIE.

[4]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[5]  Ralf Krestel,et al.  ERSS 2005: Coreference-Based Summarization Reloaded , 2005 .

[6]  Philip S. Yu,et al.  Discovering unexpected information from your competitors' web sites , 2001, KDD '01.

[7]  Moonis Ali,et al.  Innovations in Applied Artificial Intelligence , 2005 .

[8]  Ralf Krestel,et al.  Context-based Multi-Document Summarization using Fuzzy Coreference Cluster Graphs , 2006 .

[9]  Bei Yu,et al.  A cross-collection mixture model for comparative text mining , 2004, KDD.

[10]  René Witte,et al.  Architektur von Fuzzy-Informationssystemen , 2002, VLDB 2002.

[11]  Michael W. Berry,et al.  Survey of Text Mining , 2003, Springer New York.

[12]  Mark T. Maybury,et al.  Automatic Summarization , 2002, Computational Linguistics.

[13]  Michael W. Berry,et al.  Survey of Text Mining: Clustering, Classification, and Retrieval , 2007 .

[14]  George J. Klir,et al.  Fuzzy sets, uncertainty and information , 1988 .

[15]  Marie-Francine Moens,et al.  Clustering Algorithms for Noun Phrase Coreference Resolution , 2004 .

[16]  René Witte,et al.  Fuzzy Coreference Resolution for Summarization , 2003 .