Performance Assessment of Some Clustering Algorithms Based on a Fuzzy Granulation-Degranulation Criterion

In this paper a fuzzy quantization dequantization criterion is used to propose an evaluation technique to determine the appropriate clustering algorithm suitable for a particlar data set. In general, the goodness of a partitioning is measured by computing the variances within it, which is a measure of compactness of the obtained partitioning. Here a new kind of error function, which reflects how well the formed cluster centers represent the whole data set, is used as the goodness of the obtained partitioning. Thus a clustering algorithm, providing a good set of centers which approximate the whole data set perfectly, is best suitable for partitioning that particular data set. Five well-known clustering algorithms, GAK-means (genetic algorithm based K-means algorithm), a newly developed genetic point symmetry based clustering technique (GAPS-clustering), average linkage clustering algorithm, expectation maximization (EM) clustering algorithm and self organizing map (SOM) are used as the underlying partitioning techniques. Five artificially generated and three real-life data sets are used to establish that the proposed methodology is able to correctly identify appropriate clustering algorithm for a particular data set.

[1]  U. Fayyad,et al.  Scaling EM (Expectation Maximization) Clustering to Large Databases , 1998 .

[2]  Anurag Tiwari,et al.  Enhanced reliability of finite-state machines in FPGA through efficient fault detection and correction , 2005, IEEE Transactions on Reliability.

[3]  Donatella Sciuto,et al.  Design of VHDL-based totally self-checking finite-state machine and data-path descriptions , 2000, IEEE Trans. Very Large Scale Integr. Syst..

[4]  J. Barth,et al.  Space, atmospheric, and terrestrial radiation environments , 2003 .

[5]  Ujjwal Maulik,et al.  Genetic clustering for automatic evolution of clusters and application to image classification , 2002, Pattern Recognit..

[6]  Brian Everitt,et al.  Cluster analysis , 1974 .

[7]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[8]  Prithviraj Banerjee,et al.  RSYN: a system for automated synthesis of reliable multilevel circuits , 1994, IEEE Trans. Very Large Scale Integr. Syst..

[9]  Ujjwal Maulik,et al.  Genetic algorithm-based clustering technique , 2000, Pattern Recognit..

[10]  Sanghamitra Bandyopadhyay,et al.  GAPS: A clustering method using a new point symmetry-based distance measure , 2007, Pattern Recognit..

[11]  Gernot Metze,et al.  Design of Totally Self-Checking Check Circuits for m-Out-of-n Codes , 1973, IEEE Transactions on Computers.

[12]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory, Third Edition , 1989, Springer Series in Information Sciences.

[13]  Witold Pedrycz,et al.  Fuzzy vector quantization with the particle swarm optimization: A study in fuzzy granulation-degranulation information processing , 2007, Signal Process..

[14]  I. Guyon,et al.  Detecting stable clusters using principal component analysis. , 2003, Methods in molecular biology.

[15]  Janak H. Patel,et al.  Memory System Design for Tolerating Single Event Upsets , 1983, IEEE Transactions on Nuclear Science.

[16]  Niraj K. Jha,et al.  Design and synthesis of self-checking VLSI circuits , 1993, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..