K-Means Clustering over Peer-to-peer Networks

This paper presents preliminary work on an algorithm for K-means clustering of homogeneously distributed data in a peer-to-peer network. The algorithm is asynchronous and each node operates locally by communicating only with its topologically neighboring nodes. Importantly, large scal e synchronization is not required. Empirical results show, i n many cases, the final centroids produced are very close to the final centroids produced by standard K-means run on centralized data. Consequently, the number of incorrectly labeled data points is small.

[1]  Inderjit S. Dhillon,et al.  A Data-Clustering Algorithm on Distributed Memory Multiprocessors , 1999, Large-Scale Parallel Data Mining.

[2]  Bin Zhang,et al.  Distributed data clustering can be efficient and exact , 2000, SKDD.

[3]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[4]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[5]  Wolfgang Müller,et al.  Classifying Documents by Distributed P2P Clustering , 2003, GI Jahrestagung.

[6]  M WojtekKowalczyk,et al.  Towards Data Mining in Large and Fully Distributed Peer-to-Peer Overlay Networks , 2003 .

[7]  A. Schuster,et al.  Association rule mining in peer-to-peer systems , 2004, IEEE Trans. Syst. Man Cybern. Part B.

[8]  Ran Wolff,et al.  Association rule mining in peer-to-peer systems , 2003, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[9]  Ujjwal Maulik,et al.  Clustering distributed data streams in peer-to-peer environments , 2006, Inf. Sci..