Map / Reduce Affinity Propagation Clustering Algorithm

The Affinity Propagation (AP) is a clustering algorithm that does not require pre-set K cluster numbers. We improve the original AP to Map/Reduce Affinity Propagation (MRAP) implemented in Hadoop, a distribute cloud environment. The architecture of MRAP is divided to multiple mappers and one reducer in Hadoop. In the experiments, we compare the clustering result of the proposed MRAP with the K-means method. The experiment results support that the proposed MRAP method has good performance in terms of accuracy and Davies–Bouldin index value. Also, by applying the proposed MRAP method can reduce the number of iterations before convergence for the K-means method irrespective to the data dimensions.

[1]  Shi Zhongzhi,et al.  An Efficient Data Mining Framework on Hadoop using Java Persistence API , 2010, 2010 10th IEEE International Conference on Computer and Information Technology.

[2]  Brendan J. Frey,et al.  A Binary Variable Model for Affinity Propagation , 2009, Neural Computation.

[3]  Xiaolong Wang,et al.  An adaptive affinity propagation document clustering , 2010, 2010 The 7th International Conference on Informatics and Systems (INFOS).

[4]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[5]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[6]  M. Maurya,et al.  Performance analysis of MapReduce programs on Hadoop cluster , 2012, 2012 World Congress on Information and Communication Technologies.

[7]  David J. Kriegman,et al.  The yale face database , 1997 .

[8]  Chang-Dong Wang,et al.  Multi-Exemplar Affinity Propagation , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Xu Zhengqiao,et al.  Research on Clustering Algorithm for Massive Data Based on Hadoop Platform , 2012, 2012 International Conference on Computer Science and Service System.

[10]  Milind A. Bhandarkar,et al.  MapReduce programming with apache Hadoop , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[11]  David L. Olson,et al.  Advanced Data Mining Techniques , 2008 .