Fast affinity propagation clustering based on incomplete similarity matrix

Affinity propagation (AP) is a recently proposed clustering algorithm, which has been successful used in a lot of practical problems. Although effective in finding meaningful clustering solutions, a key disadvantage of AP is its efficiency, which has become the bottleneck when applying AP for large-scale problems. In the literature, most of the methods proposed to improve the efficiency of AP are based on implementing the message-passing on a sparse similarity matrix, while neither the decline in effectiveness nor the improvement in efficiency is theoretically analyzed. In this paper, we propose a two-stage fast affinity propagation (FastAP) algorithm. Different from previous work, the scale of the similarity matrix is first compressed by selecting only potential exemplars, then further reduced by sparseness according to k nearest neighbors. More importantly, we provide theoretical analysis, based on which the improvement of efficiency in our method is controllable with guaranteed clustering performance. In experiments, two synthetic data sets, seven publicly available data sets, and two real-world streaming data sets are used to evaluate the proposed method. The results demonstrate that FastAP can achieve comparable clustering performances with the original AP algorithm, while the computational efficiency has been improved with a several-fold speed-up on small data sets and a dozens-of-fold on larger-scale data sets.

[1]  Anil K. Jain Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[2]  Anna Kazantseva,et al.  Linear Text Segmentation Using Affinity Propagation , 2011, EMNLP.

[3]  O. Kariv,et al.  An Algorithmic Approach to Network Location Problems. II: The p-Medians , 1979 .

[4]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[5]  Wendy R. Fox,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1991 .

[6]  Jian Pei,et al.  Data Mining: Concepts and Techniques, 3rd edition , 2006 .

[7]  Inmar E. Givoni,et al.  Beyond Affinity Propagation: Message Passing Algorithms for Clustering , 2012 .

[8]  Zvi Drezner,et al.  Facility location - applications and theory , 2001 .

[9]  Ying Wu,et al.  Review of Clustering Algorithms , 2009 .

[10]  X. Jin Factor graphs and the Sum-Product Algorithm , 2002 .

[11]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[12]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[13]  Yasuhiro Fujiwara,et al.  Fast Algorithm for Affinity Propagation , 2011, IJCAI.

[14]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[15]  Luigi Chisci,et al.  Real-Time Epileptic Seizure Prediction Using AR Models and Support Vector Machines , 2010, IEEE Transactions on Biomedical Engineering.

[16]  Jian Yu,et al.  Affinity Propagation on Identifying Communities in Social and Biological Networks , 2010, KSEM.

[17]  Jianxiong Xiao,et al.  Joint Affinity Propagation for Multiple View Segmentation , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[18]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[19]  Delbert Dueck,et al.  Affinity Propagation: Clustering Data by Passing Messages , 2009 .

[20]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[21]  Brendan J. Frey,et al.  Semi-Supervised Affinity Propagation with Instance-Level Constraints , 2009, AISTATS.

[22]  Christine Nardini,et al.  Partitioning networks into communities by message passing. , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[23]  James Bailey,et al.  Information theoretic measures for clusterings comparison: is a correction for chance necessary? , 2009, ICML '09.

[24]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[25]  Xian-Sheng Hua,et al.  Finding image exemplars using fast sparse affinity propagation , 2008, ACM Multimedia.

[26]  Brendan J. Frey,et al.  Hierarchical Affinity Propagation , 2011, UAI.

[27]  Maurizio Marchese,et al.  Text Clustering with Seeds Affinity Propagation , 2011, IEEE Transactions on Knowledge and Data Engineering.

[28]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[29]  Brendan J. Frey,et al.  Response to Comment on "Clustering by Passing Messages Between Data Points" , 2008, Science.

[30]  Ying Wu,et al.  Non-Standard Parameter Adaptation for Exploratory Data Analysis , 2009, Studies in Computational Intelligence.

[31]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[32]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[33]  Michele Leone,et al.  Clustering by Soft-constraint Affinity Propagation: Applications to Gene-expression Data , 2022 .

[34]  Manuele Bicego,et al.  Biclustering of Expression Microarray Data Using Affinity Propagation , 2011, PRIB.

[35]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[36]  David G. Stork,et al.  Pattern Classification , 1973 .

[37]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[38]  Patrick J. F. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 2003 .

[39]  Chonghui Guo,et al.  Incremental Affinity Propagation Clustering Based on Message Passing , 2014, IEEE Transactions on Knowledge and Data Engineering.

[40]  James T. Kwok,et al.  Clustered Nyström Method for Large Scale Manifold Learning and Dimension Reduction , 2010, IEEE Transactions on Neural Networks.