Local L2-Thresholding Based Data Mining in Peer-to-Peer Systems

In a large network of computers, wireless sensors, or mobile devices, each of the components (hence, peers) has some data about the global status of the system. Many of the functions of the system, such as routing decisions, search strategies, data cleansing, and the assignment of mutual trust, depend on the global status. Therefore, it is essential that the system be able to detect, and react to, changes in its global status. Computing global predicates in such systems is usually very costly. Mainly because of their scale, and in some cases (e.g., sensor networks) also because of the high cost of communication. The cost further increases when the data changes rapidly (due to state changes, node failure, etc.) and computation has to follow these changes. In this paper we describe a two step approach for dealing with these costs. First, we describe a highly efficient local algorithm which detect when the L2 norm of the average data surpasses a threshold. Then, we use this algorithm as a feedback loop for the monitoring of complex predicates on the data – such as the data’s k-means clustering. The efficiency of the L2 algorithm guarantees that so long as the clustering results represent the data (i.e., the data is stationary) few resources are required. When the data undergoes an epoch change – a change in the underlying distribution – and the model no longer represents it, the feedback loop indicates this and the model is rebuilt. Furthermore, the existence of a feedback loop allows using approximate and “best-effort” methods for constructing the model; if an ill-fit model is built the feedback loop would indicate so, and the model would be rebuilt.

[1]  Johannes Gehrke,et al.  Gossip-based computation of aggregate information , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[2]  Ujjwal Maulik,et al.  Clustering distributed data streams in peer-to-peer environments , 2006, Inf. Sci..

[3]  Ibrahim Matta,et al.  BRITE: Boston University Representative Internet Topology gEnerator: A Flexible Generator of Internet Topologies , 2000 .

[4]  Anne-Marie Kermarrec,et al.  From Epidemics to Distributed Computing , 2004 .

[5]  Solomon Kullback,et al.  Information Theory and Statistics , 1970, The Mathematical Gazette.

[6]  Kenji Satou,et al.  ASYNCHRONOUS PEER-TO-PEER COMMUNICATION FOR FAILURE RESILIENT DISTRIBUTED GENETIC ALGORITHMS , 2003 .

[7]  A. Schuster,et al.  Association rule mining in peer-to-peer systems , 2004, IEEE Trans. Syst. Man Cybern. Part B.

[8]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[9]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[10]  Rajeev Motwani,et al.  The price of validity in dynamic networks , 2004, SIGMOD '04.

[11]  Jiawei Han,et al.  Maintenance of discovered association rules in large databases: an incremental updating technique , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[12]  Sanjay Ranka,et al.  An Efficient Algorithm for the Incremental Updation of Association Rules in Large Databases , 1997, KDD.

[13]  M WojtekKowalczyk,et al.  Towards Data Mining in Large and Fully Distributed Peer-to-Peer Overlay Networks , 2003 .

[14]  Hector Garcia-Molina,et al.  The Eigentrust algorithm for reputation management in P2P networks , 2003, WWW '03.

[15]  Stephen P. Boyd,et al.  Gossip algorithms: design, analysis and applications , 2005, Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies..

[16]  Nathan Linial,et al.  Locality in Distributed Graph Algorithms , 1992, SIAM J. Comput..

[17]  Shay Kutten,et al.  Fault-Local Distributed Mending , 1999, J. Algorithms.

[18]  Ran Wolff,et al.  A Local Facility Location Algorithm for Sensor Networks , 2005, DCOSS.