A Comparison Study of Cluster Validity Indices Using a Nonhierarchical Clustering Algorithm

Cluster analysis is widely used in the initial stages of data analysis and data reduction. The K-means algorithm, a nonhierarchical clustering algorithm, has regained popularity among researchers in data mining and knowledge discovery, partly because of its low time complexity. The algorithm requires the number of clusters as an input parameter. When the parameter value is not known a priori, a researcher often has to use a cluster validity index to search for a suitable parameter value. In this study, we use computational experiments to examine the performance of cluster validity indices with the K-means algorithm. Our analysis parallels the study performed by Milligan and Cooper on cluster validity indices; we use hierarchical clustering algorithms and present observations and conclusions resulting from the simulation study

[1]  Siddheswar Ray,et al.  Determination of Number of Clusters in K-Means Clustering and Application in Colour Image Segmentation , 2000 .

[2]  G. W. Milligan,et al.  An algorithm for generating artificial test clusters , 1985 .

[3]  G. W. Milligan,et al.  An examination of the effect of six types of error perturbation on fifteen clustering algorithms , 1980 .

[4]  Michalis Vazirgiannis,et al.  Clustering validity assessment: finding the optimal partitioning of a data set , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[5]  F. Marriott Practical problems in a method of cluster analysis. , 1971, Biometrics.

[6]  Geoffrey H. Ball,et al.  ISODATA, A NOVEL METHOD OF DATA ANALYSIS AND PATTERN CLASSIFICATION , 1965 .

[7]  André Hardy,et al.  An examination of procedures for determining the number of clusters in a data set , 1994 .

[8]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[9]  R C Durfee,et al.  A METHOD OF CLUSTER ANALYSIS. , 1970, Multivariate behavioral research.

[10]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[11]  A. Scott,et al.  Clustering methods based on likelihood ratio criteria. , 1971 .

[12]  H. P. Friedman,et al.  On Some Invariant Criteria for Grouping Data , 1967 .

[13]  Ali Kara,et al.  HINoV: A New Model to Improve Market Segment Definition by Identifying Noisy Variables , 1999 .

[14]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Takio Kurita,et al.  An efficient agglomerative clustering algorithm using a heap , 1991, Pattern Recognit..

[16]  P. Sopp Cluster analysis. , 1996, Veterinary immunology and immunopathology.

[17]  G. W. Milligan,et al.  A monte carlo study of thirty internal criterion measures for cluster analysis , 1981 .

[18]  Vladimir Makarenkov,et al.  Optimal Variable Weighting for Ultrametric and Additive Trees and K-means Partitioning: Methods and Software , 2001, J. Classif..

[19]  S. Dolnicar,et al.  An examination of indexes for determining the number of clusters in binary data sets , 2002, Psychometrika.

[20]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[21]  L. Hubert,et al.  A general statistical framework for assessing categorical clustering in free recall. , 1976 .

[22]  F. Rohlf Methods of Comparing Classifications , 1974 .

[23]  Brian Everitt,et al.  Cluster analysis , 1974 .

[24]  Vincent Kanade,et al.  Clustering Algorithms , 2021, Wireless RF Energy Transfer in the Massive IoT Era.

[25]  William H. E. Day,et al.  COMPLEXITY THEORY: AN INTRODUCTION FOR PRACTITIONERS OF CLASSIFICATION , 1996 .

[26]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[27]  James C. Bezdek,et al.  Some new indexes of cluster validity , 1998, IEEE Trans. Syst. Man Cybern. Part B.