A Point Symmetry-Based Clustering Technique for Automatic Evolution of Clusters

In this paper, a new symmetry-based genetic clustering algorithm is proposed which automatically evolves the number of clusters as well as the proper partitioning from a data set. Strings comprise both real numbers and the don't care symbol in order to encode a variable number of clusters. Here, assignment of points to different clusters are done based on a point symmetry (PS)-based distance rather than the Euclidean distance. A newly proposed PS-based cluster validity index, sym-index, is used as a measure of the validity of the corresponding partitioning. The algorithm is, therefore, able to detect both convex and nonconvex clusters irrespective of their sizes and shapes as long as they possess the symmetry property. Kd-tree-based nearest neighbor search is used to reduce the complexity of computing PS-based distance. A proof on the convergence property of variable string length genetic algorithm with PS- distance-based clustering (VGAPS-clustering) technique is also provided. The effectiveness of VGAPS-clustering compared to variable string length genetic K-means algorithm (GCUK-clustering) and one recently developed weighted sum validity function-based hybrid niching genetic algorithm (HNGA-clustering) is demonstrated for nine artificial and five real-life data sets.

[1]  Weina Wang,et al.  On fuzzy cluster validity indices , 2007, Fuzzy Sets Syst..

[2]  Lalit M. Patnaik,et al.  Adaptive probabilities of crossover and mutation in genetic algorithms , 1994, IEEE Trans. Syst. Man Cybern..

[3]  Chien-Hsing Chou,et al.  Symmetry as A new Measure for Cluster Validity , 2002 .

[4]  Weiguo Sheng,et al.  A weighted sum validity function for clustering with a hybrid niching genetic algorithm , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[5]  Isak Gath,et al.  Unsupervised Optimal Fuzzy Clustering , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Raghu Krishnapuram,et al.  Fitting an unknown number of lines and planes to image data through compatible cluster merging , 1992, Pattern Recognit..

[7]  Sunil Arya,et al.  ANN: library for approximate nearest neighbor searching , 1998 .

[8]  Nelson F. F. Ebecken,et al.  A genetic algorithm for cluster analysis , 2003, Intell. Data Anal..

[9]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Chien-Hsing Chou,et al.  Short Papers , 2001 .

[11]  Mark de Berg,et al.  Computational geometry: algorithms and applications , 1997 .

[12]  Minho Kim,et al.  New indices for cluster validity assessment , 2005, Pattern Recognit. Lett..

[13]  F. Attneave Symmetry, information, and memory for patterns. , 1955, The American journal of psychology.

[14]  Günter Rudolph,et al.  Convergence analysis of canonical genetic algorithms , 1994, IEEE Trans. Neural Networks.

[15]  James C. Bezdek,et al.  Some new indexes of cluster validity , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[16]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[17]  Ujjwal Maulik,et al.  Multiobjective Genetic Clustering for Pixel Classification in Remote Sensing Imagery , 2007, IEEE Transactions on Geoscience and Remote Sensing.

[18]  Ujjwal Maulik,et al.  Fuzzy partitioning using a real-coded variable-length genetic algorithm for pixel classification , 2003, IEEE Trans. Geosci. Remote. Sens..

[19]  Sankar K. Pal,et al.  Fuzzy multi-layer perceptron, inferencing and rule generation , 1995, IEEE Trans. Neural Networks.

[20]  Sanghamitra Bandyopadhyay,et al.  GAPS: A clustering method using a new point symmetry-based distance measure , 2007, Pattern Recognit..

[21]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[22]  R HruschkaEduardo,et al.  A genetic algorithm for cluster analysis , 2003 .

[23]  M.-C. Su,et al.  A new cluster validity measure and its application to image compression , 2004, Pattern Analysis and Applications.

[24]  D. A. Preece,et al.  An introduction to the statistical analysis of data , 1979 .

[25]  Ujjwal Maulik,et al.  Genetic clustering for automatic evolution of clusters and application to image classification , 2002, Pattern Recognit..

[26]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[27]  Brian Everitt,et al.  Cluster analysis , 1974 .

[28]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[29]  W. Peizhuang Pattern Recognition with Fuzzy Objective Function Algorithms (James C. Bezdek) , 1983 .

[30]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .

[31]  I. Guyon,et al.  Detecting stable clusters using principal component analysis. , 2003, Methods in molecular biology.

[32]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[33]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[34]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[35]  M. Narasimha Murty,et al.  Genetic K-means algorithm , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[36]  Sanghamitra Bandyopadhyay Simulated annealing using a reversible jump Markov chain Monte Carlo algorithm for fuzzy clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[37]  Ujjwal Maulik,et al.  Performance Evaluation of Some Clustering Algorithms and Validity Indices , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[38]  Paula Brito,et al.  A partitional clustering algorithm validated by a clustering tendency index based on graph theory , 2006, Pattern Recognit..

[39]  S. Bandyopadhyay,et al.  Nonparametric genetic clustering: comparison of validity indices , 2001, IEEE Trans. Syst. Man Cybern. Syst..

[40]  Sanghamitra Bandyopadhyay,et al.  A Fuzzy Genetic Clustering Technique Using a New Symmetry Based Distance for Automatic Evolution of Clusters , 2007, 2007 International Conference on Computing: Theory and Applications (ICCTA'07).

[41]  James C. Bezdek,et al.  Validity-guided (re)clustering with applications to image segmentation , 1996, IEEE Trans. Fuzzy Syst..