Distributed Data Mining in Peer-to-Peer Networks

Peer-to-peer (P2P) networks are gaining popularity in many applications such as file sharing, e-commerce, and social networking, many of which deal with rich, distributed data sources that can benefit from data mining. P2P networks are, in fact, well-suited to distributed data mining (DDM), which deals with the problem of data analysis in environments with distributed data, computing nodes, and users. This article offers an overview of DDM applications and algorithms for P2P environments, focusing particularly on local algorithms that perform data analysis by using computing primitives with limited communication overhead. The authors describe both exact and approximate local P2P data mining algorithms that work in a decentralized and communication-efficient manner

[1]  Johannes Gehrke,et al.  Gossip-based computation of aggregate information , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[2]  Kun Liu,et al.  Distributed Identification of Top-l Inner Product Elements and its Application in a Peer-to-Peer Network , 2008, IEEE Transactions on Knowledge and Data Engineering.

[3]  Rajeev Motwani,et al.  The price of validity in dynamic networks , 2004, SIGMOD '04.

[4]  Baruch Awerbuch,et al.  Compact distributed data structures for adaptive routing , 1989, STOC '89.

[5]  Leonidas J. Guibas,et al.  Wireless sensor networks - an information processing approach , 2004, The Morgan Kaufmann series in networking.

[6]  Katharina Morik,et al.  Distributed feature extraction in a p2p setting - a case study , 2007, Future Gener. Comput. Syst..

[7]  Ran Wolff,et al.  Local L2-Thresholding Based Data Mining in Peer-to-Peer Systems , 2006, SDM.

[8]  Wolfgang Müller,et al.  Classifying Documents by Distributed P2P Clustering , 2003, GI Jahrestagung.

[9]  Hector Garcia-Molina,et al.  The Eigentrust algorithm for reputation management in P2P networks , 2003, WWW '03.

[10]  H. Kargupta,et al.  K-Means Clustering over Peer-to-peer Networks , 2005 .

[11]  Christopher Olston,et al.  Distributed top-k monitoring , 2003, SIGMOD '03.

[12]  Chris Clifton,et al.  Privacy-preserving k-means clustering over vertically partitioned data , 2003, KDD '03.

[13]  Ran Wolff,et al.  A Local Facility Location Algorithm for Sensor Networks , 2005, DCOSS.

[14]  Ran Wolff,et al.  Association rule mining in peer-to-peer systems , 2003, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[15]  M WojtekKowalczyk,et al.  Towards Data Mining in Large and Fully Distributed Peer-to-Peer Overlay Networks , 2003 .

[16]  Hillol Kargupta,et al.  K-Means Clustering Over a Large, Dynamic Network , 2006, SDM.