Data mining refers to extracting or “mining” knowledge from large amounts of data. Clustering is one of the most important research areas in the field of data mining. Clustering means creating groups of objects based on their features in such a way that the objects belonging to the same groups are similar and those belonging in different groups are dissimilar. In this paper, the most representative partition based clustering algorithms are described and categorized based on their basic approach. The best algorithm is found out based on their performance. Two of the clustering algorithms, namely, Centroid based k-means, Representative object based k-medoids are implemented by using JAVA and their performance is analyzed based on their clustering quality. The randomly distributed data points are taken as input to these algorithms and clusters are found out for each algorithm. The algorithm’s performance is analyzed by different runs on the input data points. The experimental results are given as both graphical as well as tabular representation.
[1]
Jiawei Han,et al.
Data Mining: Concepts and Techniques
,
2000
.
[2]
Tong Zhao,et al.
K-means clustering-based data detection and symbol-timing recovery for burst-mode optical receiver
,
2006,
IEEE Transactions on Communications.
[3]
D. Coomans,et al.
Comparison of Multivariate Discrimination Techniques for Clinical Data— Application to the Thyroid Functional State
,
1983,
Methods of Information in Medicine.
[4]
Donald W. Bouldin,et al.
A Cluster Separation Measure
,
1979,
IEEE Transactions on Pattern Analysis and Machine Intelligence.
[5]
D. Rubin,et al.
Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper
,
1977
.
[6]
Yi Pan,et al.
Improved K-means clustering algorithm for exploring local protein sequence motifs representing common structural property
,
2005,
IEEE Transactions on NanoBioscience.