Chapter 16 – Cluster Validity

Publisher Summary This chapter discusses clustering validity stage of a clustering procedure. The chapter presents methods suitable for quantitative evaluation of the results of a clustering algorithm, known under the general term cluster validity. Cluster validity can be approached in three possible directions. First is to evaluate C (where C is the clustering structure resulting from the application of a clustering algorithm on data set X) in terms of an independently drawn structure, which is imposed on X a priori and reflects intuition about the clustering structure of X. The criteria used for the evaluation of this kind are called external criteria. External criteria may be used to measure the degree to which the available data confirm a prespecified structure, without applying any clustering algorithm to X. The criteria used for this kind of evaluation are called internal criteria. Last approach is to evaluate C by comparing it with other clustering structures, resulting from the application of the same clustering algorithm, but with different parameter values, or of other clustering algorithms to X. Criteria of this kind are called relative criteria. This chapter also focuses on the definitions of internal, external, and relative criteria and the random hypotheses used in each case. Indices, adopted in the framework of external and internal criteria, are presented, and examples are provided showing the use of these indices.

[1]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  J. Farris On the Cophenetic Correlation Coefficient , 1969 .

[3]  Bernard W. Silverman,et al.  Short distances, flat triangles and Poisson limits , 1978, Journal of Applied Probability.

[4]  Ronald L. Iman,et al.  On a method for detecting clusters of possible uranium deposits , 1979 .

[5]  Jack-Gérard Postaire,et al.  Cluster Analysis by Binary Morphology , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Michalis Vazirgiannis,et al.  Quality Scheme Assessment in the Clustering Process , 2000, PKDD.

[7]  R. Mead,et al.  A test for spatial pattern at several scales using data from a grid of contiguous quadrats. , 1974 .

[8]  D. J. Strauss A model for clustering , 1975 .

[9]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  R. F. Ling A Probability Theory of Cluster Analysis , 1973 .

[11]  Robert F. Ling,et al.  On the theory and construction of k-clusters , 1972, Comput. J..

[12]  L. Hubert,et al.  A Graph-Theoretic Approach to Goodness-of-Fit in Complete-Link Hierarchical Clustering , 1976 .

[13]  M. P. Windham Cluster validity for fuzzy clustering algorithms , 1981 .

[14]  T. Cox,et al.  A conditioned distance ratio method for analyzing spatial patterns , 1976 .

[15]  Isak Gath,et al.  Unsupervised Optimal Fuzzy Clustering , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Anil K. Jain,et al.  Sparse Decompositions for Exploratory Pattern Analysis , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Clifford M. Hurvich,et al.  Regression and time series model selection in small samples , 1989 .

[18]  J. Friedman,et al.  Multivariate generalizations of the Wald--Wolfowitz and Smirnov two-sample tests , 1979 .

[19]  Nikhil R. Pal,et al.  Cluster validation using graph theoretic concepts , 1997, Pattern Recognit..

[20]  Anil K. Jain,et al.  A Clustering Performance Measure Based on Fuzzy Set Decomposition , 1981, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[22]  C. Mallows,et al.  A Method for Comparing Two Hierarchical Clusterings , 1983 .

[23]  L. Hubert Approximate Evaluation Techniques for the Single-Link and Complete-Link Hierarchical Clustering Procedures , 1974 .

[24]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[25]  Guangzhou Zeng,et al.  A comparison of tests for randomness , 1985, Pattern Recognit..

[26]  Michalis Vazirgiannis,et al.  Clustering validity assessment: finding the optimal partitioning of a data set , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[27]  Anil K. Jain,et al.  Bootstrap Techniques for Error Estimation , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  R. F. Ling,et al.  Probability Tables for Cluster Analysis Based on a Theory of Random Graphs , 1976 .

[29]  Richard C. Dubes,et al.  A test for spatial homogeneity in cluster analysis , 1987 .

[30]  Adrian E. Raftery,et al.  How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[31]  Anil K. Jain,et al.  Bootstrap technique in cluster analysis , 1987, Pattern Recognit..

[32]  Fatos T. Yarman-Vural,et al.  Noise, histogram and cluster validity for Gaussian-mixtured data , 1987, Pattern Recognit..

[33]  RICHARD C. DUBES,et al.  How many clusters are best? - An experiment , 1987, Pattern Recognit..

[34]  S. Sclove Application of model-selection criteria to some problems in multivariate analysis , 1987 .

[35]  Brian D. Ripley,et al.  Quick tests for spatial interaction , 1978 .

[36]  Richard C. Dubes,et al.  Cluster validity profiles , 1982, Pattern Recognit..

[37]  Jun Zhang,et al.  Cluster validation for unsupervised stochastic model-based image segmentation , 1998, IEEE Trans. Image Process..

[38]  G. W. Milligan,et al.  An examination of the effect of six types of error perturbation on fifteen clustering algorithms , 1980 .

[39]  G. W. Milligan,et al.  The Effect of Cluster Size, Dimensionality, and the Number of Clusters on Recovery of True Cluster Structure , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Michalis Vazirgiannis,et al.  Clustering validity checking methods: part II , 2002, SGMD.

[41]  John C. Ogilvie,et al.  Evaluation of hierarchical grouping techniques; a preliminary study , 1972, Comput. J..

[42]  Xiaomin Liu,et al.  A Least Biased Fuzzy Clustering Method , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[43]  Erdal Panayirci,et al.  A test for multidimensional clustering tendency , 1983, Pattern Recognit..

[44]  Michalis Vazirgiannis,et al.  Cluster validity methods: part I , 2002, SGMD.

[45]  L. Hubert,et al.  Quadratic assignment as a general data analysis strategy. , 1976 .

[46]  Hichem Frigui,et al.  The Fuzzy C Quadric Shell clustering algorithm and the detection of second-degree curves , 1993, Pattern Recognit. Lett..

[47]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[48]  Joseph Naus,et al.  Approximations for Distributions of Scan Statistics , 1982 .

[49]  J. G. Skellam,et al.  A New Method for determining the Type of Distribution of Plant Individuals , 1954 .

[50]  G. W. Milligan,et al.  An examination of procedures for determining the number of clusters in a data set , 1985 .

[51]  Hichem Frigui,et al.  Fuzzy and possibilistic shell clustering algorithms and their application to boundary detection and surface approximation. II , 1995, IEEE Trans. Fuzzy Syst..

[52]  Anil K. Jain,et al.  Validity studies in clustering methodologies , 1979, Pattern Recognit..

[53]  P. Diaconis,et al.  Computer-Intensive Methods in Statistics , 1983 .

[54]  J. Bezdek Cluster Validity with Fuzzy Sets , 1973 .

[55]  J. Dunn Well-Separated Clusters and Optimal Fuzzy Partitions , 1974 .

[56]  James C. Bezdek,et al.  On cluster validity for the fuzzy c-means model , 1995, IEEE Trans. Fuzzy Syst..

[57]  Michael P. Windham,et al.  Cluster Validity for the Fuzzy c-Means Clustering Algorithrm , 1982, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[58]  Anil K. Jain,et al.  Testing for Uniformity in Multidimensional Data , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.