An Adaptively Disperse Centroids K-Means Algorithm Based on MapReduce Model

K-means is a clustering algorithm which is used widely. Its clustering results heavily depend on the initial centroids. An adaptive method for disperse centroids is proposed to improve the stability and accuracy of the clustering result. The Adaptively Disperse Centroids K-means Algorithm (ADC-K-means) is implemented using MapReduce model on hadoop platform, and it is compared with the k-means of Mahout which is a sub-project of hadoop. The experimental result shows that proposed algorithm is effective.

[1]  Shunye Wang,et al.  Improved K-means clustering algorithm based on the optimized initial centriods , 2013, Proceedings of 2013 3rd International Conference on Computer Science and Network Technology.

[2]  K. D. Joshi,et al.  Modified K-Means for Better Initial Cluster Centres , 2013 .

[3]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[4]  Rajashree Shettar,et al.  A MapReduce framework to implement enhanced K-means algorithm , 2015, 2015 International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT).

[5]  Xiang Li,et al.  A K-means clustering with optimized initial center based on Hadoop platform , 2014, 2014 9th International Conference on Computer Science & Education.

[6]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[7]  Chaita Jani,et al.  Implementing and Improvisation of K-means Clustering , 2015 .