An Efficient Local Algorithm for Distributed Multivariate Regression in Peer-to-Peer Networks

This paper offers a local distributed algorithm for multivariate regression in large peer-to-peer environments. The algorithm is designed for distributed inferencing, data compaction, data modeling and classification tasks in many emerging peer-to-peer applications for bioinformatics, astronomy, social networking, sensor networks and web mining. Computing a global regression model from data available at the different peer-nodes using a traditional centralized algorithm for regression can be very costly and impractical because of the large number of data sources, the asynchronous nature of the peer-to-peer networks, and dynamic nature of the data/network. This paper proposes a two-step approach to deal with this problem. First, it offers an efficient local distributed algorithm that monitors the “quality” of the current regression model. If the model is outdated, it uses this algorithm as a feedback mechanism for rebuilding the model. The local nature of the monitoring algorithm guarantees low monitoring cost. Experimental results presented in this paper strongly support the theoretical claims.

[1]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[2]  Kun Liu,et al.  Client-side web mining for community formation in peer-to-peer environments , 2006, SKDD.

[3]  Kun Liu,et al.  Distributed Identification of Top-l Inner Product Elements and its Application in a Peer-to-Peer Network , 2008, IEEE Transactions on Knowledge and Data Engineering.

[4]  C. Guestrin,et al.  Distributed regression: an efficient framework for modeling sensor network data , 2004, Third International Symposium on Information Processing in Sensor Networks, 2004. IPSN 2004.

[5]  Salvatore J. Stolfo,et al.  JAM: Java Agents for Meta-Learning over Distributed Databases , 1997, KDD.

[6]  Moti Yung,et al.  The Local Detection Paradigm and Its Application to Self-Stabilization , 1997, Theor. Comput. Sci..

[7]  Ran Wolff,et al.  Noname manuscript No. (will be inserted by the editor) In-Network Outlier Detection in Wireless Sensor Networks , 2022 .

[8]  Johannes Gehrke,et al.  Gossip-based computation of aggregate information , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[9]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[10]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[11]  Yan Xing,et al.  Distributed Regression for Heterogeneous Data Sets , 2003, IDA.

[12]  Jeffrey M. Jaffe,et al.  A Responsive Distributed Routing Algorithm for Computer Networks , 1982, ICDCS.

[13]  Hui Xiong,et al.  Distributed classification in peer-to-peer networks , 2007, KDD '07.

[14]  Ran Wolff,et al.  A Local Algorithm for Ad Hoc Majority Voting via Charge Fusion , 2004, DISC.

[15]  A. Schuster,et al.  Association rule mining in peer-to-peer systems , 2004, IEEE Trans. Syst. Man Cybern. Part B.

[16]  Ran Wolff,et al.  Distributed Data Mining in Peer-to-Peer Networks , 2006, IEEE Internet Computing.

[17]  Hillol Kargupta,et al.  Distributed probabilistic inferencing in sensor networks using variational approximation , 2008, J. Parallel Distributed Comput..

[18]  Nathan Linial,et al.  Locality in Distributed Graph Algorithms , 1992, SIAM J. Comput..

[19]  Hillol Kargupta,et al.  Distributed Multivariate Regression Using Wavelet-Based Collective Data Mining , 2001, J. Parallel Distributed Comput..

[20]  H. Vincent Poor,et al.  Distributed Kernel Regression: An Algorithm for Training Collaboratively , 2006, 2006 IEEE Information Theory Workshop - ITW '06 Punta del Este.

[21]  Idit Keidar,et al.  Veracity radius: capturing the locality of distributed computations , 2006, PODC '06.

[22]  Ran Wolff,et al.  In-Network Outlier Detection in Wireless Sensor Networks , 2006, ICDCS.

[23]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[24]  Ran Wolff,et al.  Local L2-Thresholding Based Data Mining in Peer-to-Peer Systems , 2006, SDM.

[25]  Assaf Schuster,et al.  A Geometric Approach to Monitoring Threshold Functions over Distributed Data Streams , 2010, Ubiquitous Knowledge Discovery.

[26]  Stephen P. Boyd,et al.  Gossip algorithms: design, analysis and applications , 2005, Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies..

[27]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[28]  Ujjwal Maulik,et al.  Clustering distributed data streams in peer-to-peer environments , 2006, Inf. Sci..