论文信息 - A COMPARATIVE ANALYSIS BETWEEN K-MEDOIDS AND FUZZY C-MEANS CLUSTERING ALGORITHMS FOR STATISTICALLY DISTRIBUTED DATA POINTS

A COMPARATIVE ANALYSIS BETWEEN K-MEDOIDS AND FUZZY C-MEANS CLUSTERING ALGORITHMS FOR STATISTICALLY DISTRIBUTED DATA POINTS

Data clustering is a process of putting similar data into groups. A clustering algorithm partitions a data set into several groups such that the similarity within a group is larger than among groups. In the field of data mining, various clustering algorithms are proved for their clustering quality. This research work deals with, two of the most representative clustering algorithms namely centroid based K-Medoids and representative object based Fuzzy C-Means are described and analyzed based on their basic approach using the distance between two data points. For both the algorithms, a set of n data points are given in a two-dimensional space and an integer K (the number of clusters) and the problem is to determine a set of n points in the given space called centers, so as to minimize the mean squared distance from each data point to its nearest center. The performance of the algorithms is investigated during different execution of the program for the given input data points. Based on experimental results the algorithms are compared regarding their clustering quality and their performance, which depends on the time complexity between the various numbers of clusters chosen by the end user. The total elapsed time to cluster all the data points and Clustering time for each cluster are also calculated in milliseconds and the results compared with one another.

T. SANTHANAM

[1] Victor J. Rayward-Smith,et al. The Application of K-Medoids and PAM to the Clustering of Rules , 2004, IDEAL.

[2] Chongxun Zheng,et al. Fuzzy c-means clustering algorithm with a novel penalty term for image segmentation , 2005 .

[3] Ball State,et al. Comparison of Distance Measures in Cluster Analysis with Dichotomous Data , 2004 .

[4] Pavel Berkhin,et al. A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[5] H. Charles Romesburg,et al. Cluster analysis for researchers , 1984 .

[6] Ali S. Hadi,et al. Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[7] Donald W. Bouldin,et al. A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8] T. Velmurugan,et al. Computational Complexity between K-Means and K-Medoids Clustering Algorithms for Normal and Uniform Distributions of Data Points , 2010 .

[9] Agma J. M. Traina,et al. An Efficient Approach to Scale up k-medoid based Algorithms in Large Databases , 2006, SBBD.

[10] R. Krishnapuram,et al. A fuzzy relative of the k-medoids algorithm with application to web document and snippet clustering , 1999, FUZZ-IEEE'99. 1999 IEEE International Fuzzy Systems. Conference Proceedings (Cat. No.99CH36315).

[11] Weiguo Sheng,et al. A genetic k-medoids clustering algorithm , 2006, J. Heuristics.

[12] L. Pan,et al. A Novel Fuzzy C-Means Clustering Algorithm for Image Thresholding , 2004 .

[13] Anil K. Jain,et al. Data clustering: a review , 1999, CSUR.

[14] A. Mukhopadhyay,et al. An Improved Crisp and Fuzzy based Clustering Technique for Categorical Data , 2008 .

[15] Bashar Al-Shboul,et al. A fast fuzzy clustering algorithm , 2007 .

[16] Myrian C. A. Costa,et al. Parallel Fuzzy c-Means Cluster Analysis , 2006, VECPAR.

[17] Petra Perner,et al. Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[18] M. E. Muller,et al. A Note on the Generation of Random Normal Deviates , 1958 .

[19] Brigitte Grau,et al. A Cross-Comparison of Two Clustering Methods , 2001, ACL 2001.

[20] Hae-Sang Park,et al. A K-means-like Algorithm for K-medoids Clustering and Its Performance , 2006 .

[21] Anil K. Jain,et al. Algorithms for Clustering Data , 1988 .

[22] T. Velmurugan,et al. A Survey of Partition based Clustering Algorithms in Data Mining: An Experimental Approach , 2011 .

[23] Robert Tibshirani,et al. Estimating the number of clusters in a data set via the gap statistic , 2000 .