Distributed and Incremental Clustering Based on Weighted Affinity Propagation

A new clustering algorithm Affinity Propagation (AP) is hindered by its quadratic complexity. The Weighted Affinity Propagation (WAP) proposed in this paper is used to eliminate this limitation, support two scalable algorithms. Distributed AP clustering handles large datasets by merging the exemplars learned from subsets. Incremental AP extends AP to online clustering of data streams. The paper validates all proposed algorithms on benchmark and on real-world datasets. Experimental results show that the proposed approaches offer a good trade-off between computational effort and performance.

[1]  Philip S. Yu,et al.  Active Mining of Data Streams , 2004, SDM.

[2]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[3]  Michele Leone,et al.  Clustering by Soft-constraint Affinity Propagation: Applications to Gene-expression Data , 2022 .

[4]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[5]  Marina Meila,et al.  The uniqueness of a good optimum for K-means , 2006, ICML.

[6]  atherine,et al.  Finding the number of clusters in a data set : An information theoretic approach C , 2003 .

[7]  Sudipto Guha,et al.  Clustering Data Streams , 2000, FOCS.

[8]  Tomer Hertz,et al.  Boosting margin based distance functions for clustering , 2004, ICML.

[9]  Gholam-Ali Hossein-Zadeh,et al.  ROC-based determination of the number of clusters for fMRI activation detection , 2004, SPIE Medical Imaging.

[10]  E. S. Page CONTINUOUS INSPECTION SCHEMES , 1954 .

[11]  Pasi Fränti,et al.  Dynamic Local Search for Clustering with Unknown Number of Clusters , 2002, ICPR.

[12]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[13]  D. Hinkley Inference about the change-point from cumulative sum tests , 1971 .

[14]  Marina Meila,et al.  Comparing clusterings: an axiomatic view , 2005, ICML.

[15]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[16]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[17]  Chris H. Q. Ding,et al.  K-means clustering via principal component analysis , 2004, ICML.