论文信息 - K-Means Clustering over Peer-to-peer Networks

K-Means Clustering over Peer-to-peer Networks

This paper presents preliminary work on an algorithm for K-means clustering of homogeneously distributed data in a peer-to-peer network. The algorithm is asynchronous and each node operates locally by communicating only with its topologically neighboring nodes. Importantly, large scal e synchronization is not required. Empirical results show, i n many cases, the final centroids produced are very close to the final centroids produced by standard K-means run on centralized data. Consequently, the number of incorrectly labeled data points is small.

H. Kargupta | S. Datta | C. Giannella

[1] Inderjit S. Dhillon,et al. A Data-Clustering Algorithm on Distributed Memory Multiprocessors , 1999, Large-Scale Parallel Data Mining.

[2] Bin Zhang,et al. Distributed data clustering can be efficient and exact , 2000, SKDD.

[3] Jiawei Han,et al. Data Mining: Concepts and Techniques , 2000 .

[4] Petra Perner,et al. Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[5] Wolfgang Müller,et al. Classifying Documents by Distributed P2P Clustering , 2003, GI Jahrestagung.

[6] M WojtekKowalczyk,et al. Towards Data Mining in Large and Fully Distributed Peer-to-Peer Overlay Networks , 2003 .

[7] A. Schuster,et al. Association rule mining in peer-to-peer systems , 2004, IEEE Trans. Syst. Man Cybern. Part B.

[8] Ran Wolff,et al. Association rule mining in peer-to-peer systems , 2003, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[9] Ujjwal Maulik,et al. Clustering distributed data streams in peer-to-peer environments , 2006, Inf. Sci..