Homogeneity Separateness: A New Validity Measure for Clustering Problems

Several validity indices have been designed to evaluate solutions obtained by clustering algorithms. Traditional indices are generally designed to evaluate center-based clustering, where clusters are assumed to be of globular shapes with defined centers or representatives. Therefore they are not suitable to evaluate clusters of arbitrary shapes, sizes and densities, where clusters have no defined centers or representatives. In this work, HS (Homogeneity Separateness) validity measure based on a different shape is proposed. It is suitable for clusters of any shapes, sizes and/or of different densities. The main concepts of the proposed measure are explained and experimental results on both synthetic and real life data set that support the proposed measure are given.

[1]  Uzay Kaymak,et al.  Improved covariance estimation for Gustafson-Kessel clustering , 2002, 2002 IEEE World Congress on Computational Intelligence. 2002 IEEE International Conference on Fuzzy Systems. FUZZ-IEEE'02. Proceedings (Cat. No.02CH37291).

[2]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Anil K. Jain,et al.  Validity studies in clustering methodologies , 1979, Pattern Recognit..

[4]  J. Kruskal On the shortest spanning subtree of a graph and the traveling salesman problem , 1956 .

[5]  R. Prim Shortest connection networks and some generalizations , 1957 .

[6]  P. Rousseeuw,et al.  Wiley Series in Probability and Mathematical Statistics , 2005 .

[7]  Donald Gustafson,et al.  Fuzzy clustering with a fuzzy covariance matrix , 1978, 1978 IEEE Conference on Decision and Control including the 17th Symposium on Adaptive Processes.

[8]  Charles T. Zahn,et al.  Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters , 1971, IEEE Transactions on Computers.

[9]  Jaroslav Nesetril,et al.  Otakar Boruvka on minimum spanning tree problem Translation of both the 1926 papers, comments, history , 2001, Discret. Math..

[10]  Michael Ian Shamos,et al.  Computational geometry: an introduction , 1985 .

[11]  Michalis Vazirgiannis,et al.  Clustering validity assessment: finding the optimal partitioning of a data set , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[12]  Isak Gath,et al.  Unsupervised Optimal Fuzzy Clustering , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  M.-C. Su,et al.  A new cluster validity measure and its application to image compression , 2004, Pattern Analysis and Applications.

[14]  G. W. Milligan,et al.  An examination of procedures for determining the number of clusters in a data set , 1985 .

[15]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[16]  Tetsuo Asano,et al.  Clustering algorithms based on minimum and maximum spanning trees , 1988, SCG '88.

[17]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .