Clustering documents enable the user to have a good overall view of the information contained in the documents. Most classical clustering algorithms assign each data to exactly one cluster, thus forming a crisp partition of the given data, but fuzzy clustering allows for degrees of membership, to which a data belongs to different clusters. In this system, documents are clustered by using fuzzy c-means (FCM) clustering algorithm. FCM clustering is one of well-know unsupervised clustering techniques. However FCM algorithm requires the user to pre-define the number of clusters and different values of clusters corresponds to different fuzzy partitions. So the validation of clustering result is needed. PBM index and F-measure are used for cluster validity.
[1]
J. Bezdek,et al.
FCM: The fuzzy c-means clustering algorithm
,
1984
.
[2]
Sudipto Guha,et al.
CURE: an efficient clustering algorithm for large databases
,
1998,
SIGMOD '98.
[3]
Khaled M. Hammouda.
Web Mining: Identifying Document Structure for Web Document Clustering
,
2002
.
[4]
Nizar Grira,et al.
Unsupervised and Semi-supervised Clustering : a Brief Survey ∗
,
2004
.
[5]
Lionel Sacks,et al.
A Scalable Hierarchical Fuzzy Clustering Algorithm for Text Mining
,
2004
.
[6]
Khushboo Kanjani.
Parallel Non Negative Matrix Factorization for Document Clustering
,
2007
.