A novel internal validity index based on the cluster centre and the nearest neighbour cluster

Abstract It is crucial to evaluate the clustering quality in cluster analysis. In this paper, a new internal cluster validity index based on the cluster centre and the nearest neighbour cluster is designed according to the geometric distribution of objects. Moreover, a method for determining the optimal number of clusters is proposed. The new methodology can evaluate the clustering results produced by a certain clustering algorithm and determine the optimal number of clusters for a given dataset. Theoretical research and experimental results indicate the validity and good performance of the proposed index and method.

[1]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[2]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Ronald R. Yager Intelligent control of the hierarchical agglomerative clustering process , 2000, IEEE Trans. Syst. Man Cybern. Part B.

[4]  James C. Bezdek,et al.  On cluster validity for the fuzzy c-means model , 1995, IEEE Trans. Fuzzy Syst..

[5]  Horng-Lin Shieh,et al.  Robust validity index for a modified subtractive clustering algorithm , 2014, Appl. Soft Comput..

[6]  Paul S. Bradley,et al.  Clustering via Concave Minimization , 1996, NIPS.

[7]  Qinbao Song,et al.  ESC: An efficient synchronization-based clustering algorithm , 2013, Knowl. Based Syst..

[8]  Tossapon Boongoen,et al.  A Link-Based Approach to the Cluster Ensemble Problem , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  James C. Bezdek,et al.  Some new indexes of cluster validity , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[10]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[11]  Matilde Santos Peñas,et al.  New internal index for clustering validation based on graphs , 2017, Expert Syst. Appl..

[12]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[13]  Yi Zhou,et al.  How many clusters? A robust PSO-based local density model , 2016, Neurocomputing.

[14]  David G. Stork,et al.  Pattern Classification , 1973 .

[15]  Alexander Kolesnikov,et al.  Estimating the number of clusters in a numerical data set via quantization error modeling , 2015, Pattern Recognit..

[16]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[17]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[18]  Hui Xiong,et al.  Understanding and Enhancement of Internal Clustering Validation Measures , 2013, IEEE Transactions on Cybernetics.

[19]  François-Joseph Lapointe,et al.  Using the stability of objects to determine the number of clusters in datasets , 2017, Inf. Sci..

[20]  Olatz Arbelaitz,et al.  An extensive comparative study of cluster validity indices , 2013, Pattern Recognit..

[21]  Yanfang Ye,et al.  Cluster Validation Method for Determining the Number of Clusters in Categorical Sequences , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[22]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[23]  Hiok Chai Quek,et al.  Curvature-based method for determining the number of clusters , 2017, Inf. Sci..

[24]  Mohammad Hossein Fazel Zarandi,et al.  Generalized Possibilistic Fuzzy C-Means with novel cluster validity indices for clustering noisy data , 2017, Appl. Soft Comput..

[25]  Iñaki Albisua,et al.  SEP/COP: An efficient method to find the best partition in hierarchical clustering based on a new cluster validity index , 2010, Pattern Recognit..

[26]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[27]  Donald C. Wunsch,et al.  A Comparison Study of Validity Indices on Swarm-Intelligence-Based Clustering , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).