An improved index for clustering validation based on Silhouette index and Calinski-Harabasz index

The evaluation of clustering effects has been an important issue for a long time. How to effectively evaluate the clustering results of clustering algorithms is the key to the problem. The clustering effect evaluation is generally divided into internal clustering effect evaluation and external clustering effect evaluation. This paper focuses on the internal clustering effect evaluation, and proposes an improved index based on the Silhouette index and the Calinski-Harabasz index: Peak Weight Index (PWI). PWI combines the characteristics of Silhouette index and Calinski-Harabasz index, and takes the peak value of the two indexes as the impact point and gives appropriate weight within a certain range. Silhouette index and Calinski-Harabasz index will help improve the fluctuation of clustering results in the data set. Through the simulation experiments on four self-built influence data sets and two real data sets, it will prove that the PWI has excellent evaluation of clustering results.

[1]  Peter M. Kogge Jaccard Coefficients as a Potential Graph Benchmark , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[2]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Robert Krauthgamer,et al.  Navigating nets: simple algorithms for proximity search , 2004, SODA '04.

[4]  José Luis Díez,et al.  Dynamic clustering segmentation applied to load profiles of energy consumption from Spanish customers , 2014 .

[5]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[6]  Ricardo J. G. B. Campello,et al.  A fuzzy extension of the Rand index and other related indexes for clustering and classification assessment , 2007, Pattern Recognit. Lett..

[7]  Salvatore Greco,et al.  Monotonic Variable Consistency Rough Set Approaches , 2009, Int. J. Approx. Reason..

[8]  Gary E. Blau,et al.  Partition coefficient to measure bioconcentration potential of organic chemicals in fish , 1974 .

[9]  Philip S. Yu,et al.  Outlier detection for high dimensional data , 2001, SIGMOD '01.

[10]  Guy Marchal,et al.  Multi-modality image registration by maximization of mutual information , 1996, Proceedings of the Workshop on Mathematical Methods in Biomedical Image Analysis.

[11]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Michalis Vazirgiannis,et al.  c ○ 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. On Clustering Validation Techniques , 2022 .

[13]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[14]  J. Dunn Well-Separated Clusters and Optimal Fuzzy Partitions , 1974 .

[15]  Yadong Wang,et al.  Improving fuzzy c-means clustering based on feature-weight learning , 2004, Pattern Recognit. Lett..

[16]  Mark J. Embrechts,et al.  On the Use of the Adjusted Rand Index as a Metric for Evaluating Supervised Classification , 2009, ICANN.

[17]  David A. Clausi,et al.  K-means Iterative Fisher (KIF) unsupervised clustering algorithm applied to image texture segmentation , 2002, Pattern Recognit..

[18]  A. Nemec,et al.  The Fowlkes–Mallows Statistic and the Comparison of Two Independently Determined Dendrograms , 1988 .