An Internal Cluster Validity Index Using a Distance-based Separability Measure

To evaluate clustering results is a significant part of cluster analysis. There are no true class labels for clustering in typical unsupervised learning. Thus, a number of internal evaluations, which use predicted labels and data, have been created. They are also named internal cluster validity indices (CVIs). Without true labels, to design an effective CVI is not simple because it is similar to create a clustering method. And, to have more CVIs is crucial because there is no universal CVI that can be used to measure all datasets, and no specific method for selecting a proper CVI for clusters without true labels. Therefore, to apply more CVIs to evaluate clustering results is necessary. In this paper, we propose a novel CVI - called Distance-based Separability Index (DSI), based on a data separability measure. We applied the DSI and eight other internal CVIs including early studies from Dunn (1974) to most recent studies CVDD (2019) as comparison. We used an external CVI as ground truth for clustering results of five clustering algorithms on 12 real and 97 synthetic datasets. Results show DSI is an effective, unique, and competitive CVI to other compared CVIs. In addition, we summarized the general process to evaluate CVIs and created a new method - rank difference - to compare the results of CVIs.

[1]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[2]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Richard J. Roiger,et al.  Data Mining: A Tutorial Based Primer , 2002 .

[4]  Margareta Ackerman,et al.  To Cluster, or Not to Cluster: An Analysis of Clusterability Methods , 2018, Pattern Recognit..

[5]  Qingsheng Zhu,et al.  A Novel Cluster Validity Index Based on Local Cores , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[6]  Ronald M. Summers,et al.  Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning , 2016, IEEE Transactions on Medical Imaging.

[7]  David M. W. Powers,et al.  Characterization and evaluation of similarity measures for pairs of clusterings , 2009, Knowledge and Information Systems.

[8]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[9]  Caiming Zhong,et al.  An Internal Validity Index Based on Density-Involved Distance , 2019, IEEE Access.

[10]  Huda Asfour,et al.  Application of unsupervised learning to hyperspectral imaging of cardiac ablation lesions , 2018, Journal of medical imaging.

[11]  Shai Ben-David,et al.  Measures of Clustering Quality: A Working Set of Axioms for Clustering , 2008, NIPS.

[12]  Noel Cressie,et al.  Spatial data compression via adaptive dispersion clustering , 2018, Comput. Stat. Data Anal..

[13]  Jon M. Kleinberg,et al.  An Impossibility Theorem for Clustering , 2002, NIPS.

[14]  Bernard Desgraupes Clustering Indices , 2016 .

[15]  J. Dunn Well-Separated Clusters and Optimal Fuzzy Partitions , 1974 .

[16]  Mark J. Embrechts,et al.  On the Use of the Adjusted Rand Index as a Metric for Evaluating Supervised Classification , 2009, ICANN.

[17]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[18]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[19]  Shanlin Yang,et al.  A shape-based clustering method for pattern recognition of residential electricity consumption , 2019, Journal of Cleaner Production.

[20]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[21]  Yambem Jina Chanu,et al.  A Survey on Image Segmentation Methods using Clustering Techniques , 2017, European Journal of Engineering and Technology Research.

[22]  Dietrich Rebholz-Schuhmann,et al.  Deep learning-based clustering approaches for bioinformatics , 2020, Briefings Bioinform..

[23]  Hui Xiong,et al.  Understanding and Enhancement of Internal Clustering Validation Measures , 2013, IEEE Transactions on Cybernetics.

[24]  Samuel B. Williams,et al.  ASSOCIATION FOR COMPUTING MACHINERY , 2000 .

[25]  Isabelle Guyon,et al.  Clustering: Science or Art? , 2009, ICML Unsupervised and Transfer Learning.

[26]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[27]  Pasi Fränti,et al.  WB-index: A sum-of-squares based index for cluster validity , 2014, Data Knowl. Eng..

[28]  Ujjwal Maulik,et al.  Performance Evaluation of Some Clustering Algorithms and Validity Indices , 2002, IEEE Trans. Pattern Anal. Mach. Intell..