On Classifying Drifting Concepts in P2P Networks

Concept drift is a common challenge for many real-world data mining and knowledge discovery applications. Most of the existing studies for concept drift are based on centralized settings, and are often hard to adapt in a distributed computing environment. In this paper, we investigate a new research problem, P2P concept drift detection, which aims to effectively classify drifting concepts in P2P networks. We propose a novel P2P learning framework for concept drift classification, which includes both reactive and proactive approaches to classify the drifting concepts in a distributed manner. Our empirical study shows that the proposed technique is able to effectively detect the drifting concepts and improve the classification performance.

[1]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[2]  Marcus A. Maloof,et al.  Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts , 2007, J. Mach. Learn. Res..

[3]  Chih-Jen Lin,et al.  A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[4]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[5]  Pavel B. Brazdil,et al.  Machine Learning: ECML-93 , 1993, Lecture Notes in Computer Science.

[6]  Rong Chen,et al.  Distributed Web mining using Bayesian networks from multiple data streams , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[7]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[8]  Steven C. H. Hoi,et al.  Communication-Efficient Classification in P2P Networks , 2009, ECML/PKDD.

[9]  Miroslav Kubat A machine learning-based approach to load balancing in computer networks , 1992 .

[10]  Mykola Pechenizkiy,et al.  Dynamic integration of classifiers for handling concept drift , 2008, Inf. Fusion.

[11]  Peter A. Flach,et al.  Evaluation Measures for Multi-class Subgroup Discovery , 2009, ECML/PKDD.

[12]  Niall M. Adams,et al.  The impact of changing populations on classifier performance , 1999, KDD '99.

[13]  Ran Wolff,et al.  Distributed Decision-Tree Induction in Peer-to-Peer Systems , 2008 .

[14]  Hui Xiong,et al.  Distributed classification in peer-to-peer networks , 2007, KDD '07.

[15]  Zhe Wang,et al.  Modeling LSH for performance tuning , 2008, CIKM '08.

[16]  Ran Wolff,et al.  Distributed Decision‐Tree Induction in Peer‐to‐Peer Systems , 2008, Stat. Anal. Data Min..

[17]  Jian Pei,et al.  Classification spanning correlated data streams , 2006, CIKM '06.

[18]  Xindong Wu,et al.  Mining in Anticipation for Concept Change: Proactive-Reactive Prediction in Data Streams , 2006, Data Mining and Knowledge Discovery.

[19]  Gerhard Widmer,et al.  Effective Learning in Dynamic Environments by Explicit Context Tracking , 1993, ECML.

[20]  Ran Wolff,et al.  Distributed Data Mining in Peer-to-Peer Networks , 2006, IEEE Internet Computing.

[21]  Daniel Lemire,et al.  Faster retrieval with a two-pass dynamic-time-warping lower bound , 2008, Pattern Recognit..