Frugal and Online Affinity Propagation

Resume : A new Data Clustering algorithm, Affinity Propagation suffers from its quadratic complexity in function of the number of data items. Several extensions of Affinity Propagation were proposed aiming at online clustering in the data stream framework. Firstly, the case of multiply defined items, or weighted items is handled using Weighted Affinity Propagation(WAP). Secondly, Hierarchical AP achieves distributed AP and uses WAP to merge the sets of exemplars learned from subsets. Based on these two building blocks, the third algorithm performs Incremental Affinity Propagation on data streams. The paper validates the two algorithms both on benchmark and on real-world datasets. The experimental results show that the proposed approaches perform better than K-centers based approaches. Mots-cles : Data Clustering, Data Streaming, Affinity Propagation, K-centers

[1]  Chris H. Q. Ding,et al.  K-means clustering via principal component analysis , 2004, ICML.

[2]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[3]  Philip S. Yu,et al.  Active Mining of Data Streams , 2004, SDM.

[4]  Tomer Hertz,et al.  Boosting margin based distance functions for clustering , 2004, ICML.

[5]  Daphna Weinshall,et al.  Learning distance function by coding similarity , 2007, ICML '07.

[6]  Marina Meila,et al.  The uniqueness of a good optimum for K-means , 2006, ICML.

[7]  Sudipto Guha,et al.  Clustering data streams , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[8]  David V. Hinkley,et al.  Inference about the change-point in a sequence of binomial variables , 1970 .

[9]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[10]  E. S. Page CONTINUOUS INSPECTION SCHEMES , 1954 .

[11]  Pasi Fränti,et al.  Dynamic Local Search for Clustering with Unknown Number of Clusters , 2002, ICPR.

[12]  Gholam-Ali Hossein-Zadeh,et al.  ROC-based determination of the number of clusters for fMRI activation detection , 2004, SPIE Medical Imaging.

[13]  D. Hinkley Inference about the change-point from cumulative sum tests , 1971 .

[14]  Marina Meila,et al.  Comparing clusterings: an axiomatic view , 2005, ICML.

[15]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[16]  Michele Leone,et al.  Clustering by Soft-constraint Affinity Propagation: Applications to Gene-expression Data , 2022 .

[17]  Catherine A. Sugar,et al.  Finding the Number of Clusters in a Dataset , 2003 .

[18]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[19]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.