Coarse-Grained Parallel AP Clustering Algorithm based on Intra-Class and Inter-Class Distance

Affinity Propagation (AP) clustering is an algorithm based on message passing between data points, which mainly achieves clustering through the similarity between data. Compared with traditional clustering methods, the AP clustering algorithm can implement clustering without giving a predetermined number of clusters. Therefore, it has the advantages of fast and high efficiency. However, it has certain limitations in dealing with high-dimensional complex datasets. In order to improve the efficiency and accuracy of the AP clustering algorithm, a coarse-grained parallel AP clustering algorithm based on intra-class and inter-class distances is proposed: IOCAP. Firstly, the idea of granularity is introduced to divide the initial dataset into multiple subsets. Secondly, the similarity matrix is improved by combining the intra-class and inter-class distances for each subset. Finally, the improved parallel AP clustering is implemented based on the MapReduce model. Experiments on the Iris dataset, the Diabetes dataset, and the MNIST dataset show that the IOCAP algorithm has good adaptability on large datasets and can effectively improve the accuracy of the algorithm while maintaining the AP clustering effect.

[1]  Shen Chunhui,et al.  Distributed Affinity Propagation Clustering Based on MapReduce , 2012 .

[2]  Keqiu Li,et al.  Optimized big data K-means clustering using MapReduce , 2014, The Journal of Supercomputing.

[3]  Zhi-hui Zhan,et al.  Kuhn–Munkres Parallel Genetic Algorithm for the Set Cover Problem and Its Application to Large-Scale Wireless Sensor Networks , 2016, IEEE Transactions on Evolutionary Computation.

[4]  Dino Isa,et al.  An enhanced Support Vector Machine classification framework by using Euclidean distance function for text document categorization , 2011, Applied Intelligence.

[5]  Michèle Sebag,et al.  Data Stream Clustering With Affinity Propagation , 2014, IEEE Transactions on Knowledge and Data Engineering.

[6]  Yong Wang,et al.  Online active learning of decision trees with evidential data , 2016, Pattern Recognit..

[7]  Yi-Leh Wu,et al.  Map / Reduce Affinity Propagation Clustering Algorithm , 2014 .

[8]  Sen Xu,et al.  Cluster Ensemble Based on Spectral Clustering , 2012 .

[9]  Dewan Md. Farid,et al.  A feature grouping method for ensemble clustering of high-dimensional genomic big data , 2016, 2016 Future Technologies Conference (FTC).

[10]  Zhiwen Yu,et al.  Adaptive noise immune cluster ensemble using affinity propagation , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[11]  Guang R. Gao,et al.  The elephant and the mice: the role of non-strict fine-grain synchronization for modern many-core architectures , 2011, ICS '11.

[12]  Weizhong Yan,et al.  p-PIC: Parallel power iteration clustering for big data , 2013, J. Parallel Distributed Comput..

[13]  Witold Pedrycz,et al.  Towards hybrid clustering approach to data classification: Multiple kernels based interval-valued Fuzzy C-Means algorithms , 2015, Fuzzy Sets Syst..

[14]  Edward Y. Chang,et al.  Parallel Spectral Clustering in Distributed Systems , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.