论文信息 - An optimal distributed K-Means clustering algorithm based on cloudstack

An optimal distributed K-Means clustering algorithm based on cloudstack

Clustering algorithm is applied to all kinds of fields, especially in the field of data mining. Due to the increasing number of the data, it's too hard for the clustering algorithm to afford the computation time in traditional computing model. When handling with big data, the corresponding algorithms of data mining have been transformed from the original single-core or single ported into the parallel and distributed processing. Parallel processing becomes the most popular way to improve the execution performance. This paper established a Hadoop distributed cluster based on the CloudStack and implemented the optimal distributed K-Means clustering algorithm based on MapReduce. The proposed optimal distributed K-Means clustering can obtain good quality of the results and the efficiency of the execution time. The experiment results show that the optimal distributed K-Means cluster algorithm can have better performance for dealing with large-scale data set.

Ping Ping | Yingchi Mao | Xiaofang Li | Ziyang Xu

[1] Liu Jin-ling. Improvement of K contra point clustering algorithm based on level , 2008 .

[2] David B. Shmoys,et al. A Best Possible Heuristic for the k-Center Problem , 1985, Math. Oper. Res..

[3] Deng Qian-ni,et al. Cloud computing and its key techniques: Cloud computing and its key techniques , 2009 .

[4] Quan Chen,et al. Cloud computing and its key techniques: Cloud computing and its key techniques , 2009 .

[5] Luo Junzhou,et al. Cloud computing:architecture and key technologies , 2011 .

[6] Kang Chen,et al. Cloud Computing: System Instances and Current Research: Cloud Computing: System Instances and Current Research , 2010 .

[7] Samuel Sambasivam,et al. Advanced Data Clustering Methods of Mining Web Documents , 2006 .

[8] Anil K. Jain,et al. Algorithms for Clustering Data , 1988 .

[9] Xu Di-wei. Exploration on the Key Technologies of Cloud Computing , 2010 .

[10] Hans-Peter Kriegel,et al. Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering , 2009, TKDD.

[11] Ji-Gui Sun,et al. Clustering Algorithms Research , 2008 .

[12] Zheng Wei,et al. Cloud Computing:System Instances and Current Research , 2009 .

[13] Sun Ji,et al. Clustering Algorithms Research , 2008 .