Parallel K-PSO based on MapReduce

K-means is widely used in scientific research and commercial applications because of its simplicity and linearity. However, in faced of ever-growing amount of data and higher demand of cluster analysis today, how to improve the performance of K-means has become challenging and significant. So an improved method called parallel K-PSO which combines Particle Swarm Optimization (PSO) with K-means based on MapReduce is proposed in this paper. Firstly, it takes advantage of PSO to improve the global search ability of K-means, and then it makes K-means parallel with MapReduce to enhance its capability of processing massive data. We evaluate the proposed method through experimental results.

[1]  A. Ahmadyfard,et al.  Combining PSO and k-means to enhance data clustering , 2008, 2008 International Symposium on Telecommunications.

[2]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[3]  K. alik An efficient k'-means clustering algorithm , 2008 .

[4]  Alva L. Couch,et al.  Parallel K-means Clustering Algorithm on NOWs , 2003 .

[5]  Jiali Mao,et al.  The Study of Parallel K-Means Algorithm , 2006, 2006 6th World Congress on Intelligent Control and Automation.

[6]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[7]  Qing He,et al.  Parallel K-Means Clustering Based on MapReduce , 2009, CloudCom.

[8]  Hou Li-wen Cluster Analysis Based on Particle Swarm Optimization Algorithm , 2005 .

[9]  Stuart A. Roberts,et al.  New methods for the initialisation of clusters , 1996, Pattern Recognit. Lett..

[10]  Jing Zhang,et al.  A Parallel K-Means Clustering Algorithm with MPI , 2011, 2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming.

[11]  Liu Xi-yu New genetic K-means clustering algorithm based on meliorated initial center , 2008 .

[12]  Andrzej Dudek Kohonen Self-Organizing Maps for Symbolic Objects , 2008 .

[13]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[14]  Roy H. Campbell,et al.  A Parallel Implementation of K-Means Clustering on GPUs , 2008, PDPTA.

[15]  M. Narasimha Murty,et al.  Genetic K-means algorithm , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[16]  Ralf Lämmel,et al.  Google's MapReduce programming model - Revisited , 2007, Sci. Comput. Program..

[17]  Sung-Hyon Myaeng,et al.  Initializing K-Means using Genetic Algorithms , 2009 .