Validity index for crisp and fuzzy clusters

Abstract In this article, a cluster validity index and its fuzzification is described, which can provide a measure of goodness of clustering on different partitions of a data set. The maximum value of this index, called the PBM-index, across the hierarchy provides the best partitioning. The index is defined as a product of three factors, maximization of which ensures the formation of a small number of compact clusters with large separation between at least two clusters. We have used both the k-means and the expectation maximization algorithms as underlying crisp clustering techniques. For fuzzy clustering, we have utilized the well-known fuzzy c-means algorithm. Results demonstrating the superiority of the PBM-index in appropriately determining the number of clusters, as compared to three other well-known measures, the Davies–Bouldin index, Dunn's index and the Xie–Beni index, are provided for several artificial and real-life data sets.

[1]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[2]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  James C. Bezdek,et al.  On cluster validity for the fuzzy c-means model , 1995, IEEE Trans. Fuzzy Syst..

[4]  Sankar K. Pal,et al.  Fuzzy multi-layer perceptron, inferencing and rule generation , 1995, IEEE Trans. Neural Networks.

[5]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[7]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[8]  James C. Bezdek,et al.  Some new indexes of cluster validity , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[9]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[10]  Anil K. Jain,et al.  Clustering techniques: The user's dilemma , 1976, Pattern Recognit..

[11]  Pierre Michaud,et al.  Clustering techniques , 1997, Future Gener. Comput. Syst..

[12]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .

[13]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[14]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[15]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[16]  Ravi Kothari,et al.  On finding the number of clusters , 1999, Pattern Recognit. Lett..

[17]  Julius T. Tou,et al.  Pattern Recognition Principles , 1974 .

[18]  U. Fayyad,et al.  Scaling EM (Expectation Maximization) Clustering to Large Databases , 1998 .