An Efficient Enhanced K-Means Approach with Improved Initial Cluster Centers

2 Abstract: Cluster analysis is one of the major data analysis methods and the k-means clustering algorithm is the major technique which is used for many practical applications. But the original k-means algorithm is computationally expensive and the final cluster is greatly depending upon the correctness of the initial centroids, which are selected randomly. Many improvements were already proposed to improve the performance of the k-means, but most of these require additional inputs like threshold values for the number of data points in a set. In this paper a new method is proposed for finding the better initial centroids and to provide an efficient way of assigning the data points to suitable clusters. It reduces the time complexity. This algorithm is easy to implement, which requires a simple data structure to keep some information in each iteration to be used in the next iteration.

[1]  Anindya Bhattacharya,et al.  Divisive Correlation Clustering Algorithm (DCCA) for grouping of genes: detecting varying patterns in expression profiles , 2008, Bioinform..

[2]  M. P. Sebastian,et al.  Improving the Accuracy and Efficiency of the k-means Clustering Algorithm , 2009 .

[3]  Ian Witten,et al.  Data Mining , 2000 .

[4]  Abdel-Badeeh M. Salem,et al.  An efficient enhanced k-means clustering algorithm , 2006 .

[5]  Chen Zhang,et al.  K-means Clustering Algorithm with Improved Initial Center , 2009, 2009 Second International Workshop on Knowledge Discovery and Data Mining.

[6]  Madhu Yedla,et al.  Enhancing K-means Clustering Algorithm with Improved Initial Center , 2010 .

[7]  Fang Yuan,et al.  A new algorithm to get the initial centroids , 2004, Proceedings of 2004 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.04EX826).