Some connectivity based cluster validity indices

Identification of the correct number of clusters and the appropriate partitioning technique are some important considerations in clustering where several cluster validity indices, primarily utilizing the Euclidean distance, have been used in the literature. In this paper a new measure of connectivity is incorporated in the definitions of seven cluster validity indices namely, DB-index, Dunn-index, Generalized Dunn-index, PS-index, I-index, XB-index and SV-index, thereby yielding seven new cluster validity indices which are able to automatically detect clusters of any shape, size or convexity as long as they are well-separated. Here connectivity is measured using a novel approach following the concept of relative neighborhood graph. It is empirically established that incorporation of the property of connectivity significantly improves the capabilities of these indices in identifying the appropriate number of clusters. The well-known clustering techniques, single linkage clustering technique and K-means clustering technique are used as the underlying partitioning algorithms. Results on eight artificially generated and three real-life data sets show that connectivity based Dunn-index performs the best as compared to all the other six indices. Comparisons are made with the original versions of these seven cluster validity indices.

[1]  Ujjwal Maulik,et al.  Performance Evaluation of Some Clustering Algorithms and Validity Indices , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[3]  Ujjwal Maulik,et al.  Genetic clustering for automatic evolution of clusters and application to image classification , 2002, Pattern Recognit..

[4]  S. Dudoit,et al.  A prediction-based resampling method for estimating the number of clusters in a dataset , 2002, Genome Biology.

[5]  Julius T. Tou,et al.  Pattern Recognition Principles , 1974 .

[6]  Sanghamitra Bandyopadhyay,et al.  An automatic shape independent clustering technique , 2004, Pattern Recognit..

[7]  Andries Petrus Engelbrecht,et al.  Using sequential deviation to dynamically determine the number of clusters found by a local network neighbourhood artificial immune system , 2011, Appl. Soft Comput..

[8]  M.-C. Su,et al.  A new cluster validity measure and its application to image compression , 2004, Pattern Analysis and Applications.

[9]  Shai Ben-David,et al.  A Sober Look at Clustering Stability , 2006, COLT.

[10]  Robert Tibshirani,et al.  Cluster Validation by Prediction Strength , 2005 .

[11]  Andrzej Lingas,et al.  A Linear-time Construction of the Relative Neighborhood Graph From the Delaunay Triangulation , 1994, Comput. Geom..

[12]  Taher Niknam,et al.  An efficient hybrid approach based on PSO, ACO and k-means for cluster analysis , 2010, Appl. Soft Comput..

[13]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[14]  Urszula Boryczka,et al.  Finding Groups in Data: Cluster Analysis with Ants , 2006, Sixth International Conference on Intelligent Systems Design and Applications.

[15]  Sankar K. Pal,et al.  Fuzzy multi-layer perceptron, inferencing and rule generation , 1995, IEEE Trans. Neural Networks.

[16]  P. Sopp Cluster analysis. , 1996, Veterinary immunology and immunopathology.

[17]  Jiawei Han,et al.  SRDA: An Efficient Algorithm for Large-Scale Discriminant Analysis , 2008, IEEE Transactions on Knowledge and Data Engineering.

[18]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[19]  Sushmita Mitra,et al.  Fuzzy Versions of Kohonen's Net and MLP-Based Classification: Performance Evaluation for Certain Nonconvex Decision Regions , 1994, Inf. Sci..

[20]  André Hardy,et al.  An examination of procedures for determining the number of clusters in a data set , 1994 .

[21]  Minho Kim,et al.  New indices for cluster validity assessment , 2005, Pattern Recognit. Lett..

[22]  Godfried T. Toussaint,et al.  The relative neighbourhood graph of a finite planar set , 1980, Pattern Recognit..

[23]  Loo Chu Kiong,et al.  Autonomous and deterministic supervised fuzzy clustering with data imputation capabilities , 2011 .

[24]  Sanghamitra Bandyopadhyay,et al.  Classification and learning using genetic algorithms - applications in bioinformatics and web intelligence , 2007, Natural computing series.

[25]  Joachim M. Buhmann,et al.  Clustering with the Connectivity Kernel , 2003, NIPS.

[26]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Dong-Jo Park,et al.  A Novel Validity Index for Determination of the Optimal Number of Clusters , 2001 .

[28]  Sanghamitra Bandyopadhyay,et al.  GAPS: A clustering method using a new point symmetry-based distance measure , 2007, Pattern Recognit..

[29]  Joachim M. Buhmann,et al.  Stability-Based Validation of Clustering Solutions , 2004, Neural Computation.

[30]  Ujjwal Maulik,et al.  A new multi-objective technique for differential fuzzy clustering , 2011, Appl. Soft Comput..

[31]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[32]  G. W. Milligan,et al.  An examination of procedures for determining the number of clusters in a data set , 1985 .

[33]  Joshua D. Knowles,et al.  An Evolutionary Approach to Multiobjective Clustering , 2007, IEEE Transactions on Evolutionary Computation.

[34]  Ujjwal Maulik,et al.  Use of symmetry and stability for data clustering , 2010, Evol. Intell..

[35]  J. Breckenridge Replicating Cluster Analysis: Method, Consistency, and Validity. , 1989, Multivariate behavioral research.

[36]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[37]  Sanghamitra Bandyopadhyay,et al.  A Point Symmetry-Based Clustering Technique for Automatic Evolution of Clusters , 2008, IEEE Transactions on Knowledge and Data Engineering.

[38]  James C. Bezdek,et al.  Some new indexes of cluster validity , 1998, IEEE Trans. Syst. Man Cybern. Part B.