Performance Evaluation of Line Symmetry-Based Validity Indices on Clustering Algorithms

Abstract Finding the optimal number of clusters and the appropriate partitioning of the given dataset are the two major challenges while dealing with clustering. For both of these, cluster validity indices are used. In this paper, seven widely used cluster validity indices, namely DB index, PS index, I index, XB index, FS index, K index, and SV index, have been developed based on line symmetry distance measures. These indices provide the measure of line symmetry present in the partitioning of the dataset. These are able to detect clusters of any shape or size in a given dataset, as long as they possess the property of line symmetry. The performance of these indices is evaluated on three clustering algorithms: K-means, fuzzy-C means, and modified harmony search-based clustering (MHSC). The efficacy of symmetry-based validity indices on clustering algorithms is demonstrated on artificial and real-life datasets, six each, with the number of clusters varying from 2 to n, $\sqrt n ,$ where n is the total number of data points existing in the dataset. The experimental results reveal that the incorporation of line symmetry-based distance improves the capabilities of these existing validity indices in finding the appropriate number of clusters. Comparisons of these indices are done with the point symmetric and original versions of these seven validity indices. The results also demonstrate that the MHSC technique performs better as compared to other well-known clustering techniques. For real-life datasets, analysis of variance statistical analysis is also performed.

[1]  Chien-Hsing Chou,et al.  Symmetry as A new Measure for Cluster Validity , 2002 .

[2]  Ujjwal Maulik,et al.  A new line symmetry distance based automatic clustering technique: Application to image segmentation , 2011, Int. J. Imaging Syst. Technol..

[3]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[4]  G. W. Milligan,et al.  An examination of procedures for determining the number of clusters in a data set , 1985 .

[5]  Ujjwal Maulik,et al.  Genetic clustering for automatic evolution of clusters and application to image classification , 2002, Pattern Recognit..

[6]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Dinesh Kumar,et al.  Parameter adaptive harmony search algorithm for unimodal and multimodal optimization problems , 2014, J. Comput. Sci..

[8]  Olatz Arbelaitz,et al.  An extensive comparative study of cluster validity indices , 2013, Pattern Recognit..

[9]  Hui Xiong,et al.  External validation measures for K-means clustering: A data distribution perspective , 2009, Expert Syst. Appl..

[10]  Alex Alves Freitas,et al.  A Survey of Evolutionary Algorithms for Clustering , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[11]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[12]  Ujjwal Maulik,et al.  Performance Evaluation of Some Clustering Algorithms and Validity Indices , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Soon-H. Kwon Cluster validity index for fuzzy clustering , 1998 .

[14]  Chien-Hsing Chou,et al.  Short Papers , 2001 .

[15]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[16]  Sanghamitra Bandyopadhyay,et al.  Performance Evaluation of Some Symmetry-Based Cluster Validity Indexes , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[17]  S. Dolnicar,et al.  An examination of indexes for determining the number of clusters in binary data sets , 2002, Psychometrika.

[18]  D. A. Preece,et al.  An introduction to the statistical analysis of data , 1979 .

[19]  Sanghamitra Bandyopadhyay,et al.  Some connectivity based cluster validity indices , 2012, Appl. Soft Comput..

[20]  Sanghamitra Bandyopadhyay,et al.  GAPS: A clustering method using a new point symmetry-based distance measure , 2007, Pattern Recognit..

[21]  Jitender Kumar Chhabra,et al.  Effect of Harmony Search Parameters’ Variation in Clustering , 2012 .

[22]  I. Guyon,et al.  Detecting stable clusters using principal component analysis. , 2003, Methods in molecular biology.

[23]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Dong-Jo Park,et al.  A Novel Validity Index for Determination of the Optimal Number of Clusters , 2001 .

[25]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.