论文信息 - Document clustering by fuzzy c-mean algorithm

Document clustering by fuzzy c-mean algorithm

Clustering documents enable the user to have a good overall view of the information contained in the documents. Most classical clustering algorithms assign each data to exactly one cluster, thus forming a crisp partition of the given data, but fuzzy clustering allows for degrees of membership, to which a data belongs to different clusters. In this system, documents are clustered by using fuzzy c-means (FCM) clustering algorithm. FCM clustering is one of well-know unsupervised clustering techniques. However FCM algorithm requires the user to pre-define the number of clusters and different values of clusters corresponds to different fuzzy partitions. So the validation of clustering result is needed. PBM index and F-measure are used for cluster validity.

Thaung Thaung Win | Lin Mon

[1] J. Bezdek,et al. FCM: The fuzzy c-means clustering algorithm , 1984 .

[2] Sudipto Guha,et al. CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[3] Khaled M. Hammouda. Web Mining: Identifying Document Structure for Web Document Clustering , 2002 .

[4] Nizar Grira,et al. Unsupervised and Semi-supervised Clustering : a Brief Survey ∗ , 2004 .

[5] Lionel Sacks,et al. A Scalable Hierarchical Fuzzy Clustering Algorithm for Text Mining , 2004 .

[6] Khushboo Kanjani. Parallel Non Negative Matrix Factorization for Document Clustering , 2007 .