Affinity Propagation Clustering Using Centroid-Deviation-Distance Based Similarity

Clustering is a fundamental and important task in data mining. Affinity propagation clustering (APC) has demonstrated its advantages and effectiveness in various domains. APC iteratively propagates information between affinity samples, updates the responsibility matrix and availability matrix, and employs these matrices to choose cluster centroid (or exemplar) of the respective clusters. However, since it chooses the sample points as the exemplars, these exemplars may not be the realistic centroids of the clusters they belong to. There may be some deviation between exemplars and the realistic cluster centroids. As a result, samples near the decision boundary may have a relatively large similarity with other exemplar they don’t belong to, and they are easy to be clustered incorrectly. To mitigate this problem, we propose an improved APC based on centroid-deviation-distance similarity (APC-CDD). APC-CDD firstly takes advantages of k-means on the whole samples to explore the more realistic centroid of the cluster, and then calculates the approximate centroid deviation distance of each cluster. After that, it adjusts the similarity between pairwise samples by subtracting the centroid deviation distance of the clusters they belong to. Next, it utilizes APC based on centroid-deviation-distance similarity to group samples. Our empirical study on synthetic and UCI datasets shows that the proposed APC-CDD has better performance than original APC and other related approaches.

[1]  Ahmed M. Serdah,et al.  Clustering Large-Scale Data Based On Modified Affinity Propagation Algorithm , 2016, J. Artif. Intell. Soft Comput. Res..

[2]  Hans-Friedrich Köhn,et al.  Comment on "Clustering by Passing Messages Between Data Points" , 2008, Science.

[3]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[4]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[5]  D. Sculley,et al.  Web-scale k-means clustering , 2010, WWW '10.

[6]  Jie Wei,et al.  The Analysis for the Leaking Flowrate in the Plane Valveplate , 2014 .

[7]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[8]  Michalis Vazirgiannis,et al.  On Clustering Validation Techniques , 2001, Journal of Intelligent Information Systems.

[9]  U. Fayyad,et al.  Scaling EM (Expectation Maximization) Clustering to Large Databases , 1998 .

[10]  Qishan Zhang,et al.  Community discovery by propagating local and global information based on the MapReduce model , 2015, Inf. Sci..

[11]  Michele Leone,et al.  Clustering by Soft-constraint Affinity Propagation: Applications to Gene-expression Data , 2022 .

[12]  Hui Du,et al.  A New Method for Grayscale Image Segmentation Based on Affinity Propagation Clustering Algorithm , 2013, 2013 Ninth International Conference on Computational Intelligence and Security.

[13]  Renyan Zhang,et al.  Two Similarity Measure Methods Based on Human Vision Properties for Image Segmentation Based on Affinity Propagation Clustering , 2010, 2010 International Conference on Measuring Technology and Mechatronics Automation.

[14]  Peter Steenkiste,et al.  Network Anomaly Detection Using Co-clustering , 2012, 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.

[15]  Qinke Peng,et al.  Chinese Text Automatic Summarization Based on Affinity Propagation Cluster , 2009, 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery.

[16]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[17]  Antonino Staiano,et al.  Clustering and visualization approaches for human cell cycle gene expression data analysis , 2008, Int. J. Approx. Reason..

[18]  Xiangliang Zhang,et al.  K-AP: Generating Specified K Clusters by Efficient Affinity Propagation , 2010, 2010 IEEE International Conference on Data Mining.

[19]  Pasquale De Meo,et al.  A Novel Measure of Edge Centrality in Social Networks , 2012, Knowl. Based Syst..

[20]  Yong-Yeol Ahn,et al.  The Impact of Random Models on Clustering Similarity , 2017, bioRxiv.

[21]  Kristina Lerman,et al.  Analyzing microblogs with affinity propagation , 2010, SOMA '10.

[22]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[23]  Tao Guo,et al.  Adaptive Affinity Propagation Clustering , 2008, ArXiv.

[24]  Tu Chong-yang,et al.  Semi-supervised Affinity Propagation Clustering , 2007 .