An Active Learning Method for Mining Time-Changing Data Streams

Many applications generate continuous, time-changing data streams. Mining it for an adaptive classifier is of great interest and challenge. Many previous efforts impractically assume the labeled data is available and can be mined at anytime. In this paper, we propose an effective active learning method to mine time-changing data streams efficiently. It designs a way to monitoring the possible changes on the fly without need knowing the labeled data. Upon the suspected changes are indicated, it employs a light-weight uncertainty sampling algorithm to choose the most informative instances to label. With these representative labeled instances, it tests the significance of the suspected changes. If the changes indeed cause significant performance deterioration of the current classifier, it reconstructs the old model. Thus, our method can reliably detect significant changes, quickly adapt to concept-drift, and result effective models. Experimental results from real-world data confirm the advantages of our method.

[1]  Philip S. Yu,et al.  Decision tree evolution using limited number of labeled data items from drifting data streams , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[2]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[3]  Philip S. Yu,et al.  On demand classification of data streams , 2004, KDD.

[4]  Shlomo Argamon,et al.  Committee-Based Sampling For Training Probabilistic Classi(cid:12)ers , 1995 .

[5]  LastMark Online classification of nonstationary data streams , 2002 .

[6]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[7]  Philip M. Long,et al.  Tracking Drifting Concepts By Minimizing Disagreements , 2004, Machine Learning.

[8]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[9]  Wei Fan StreamMiner: A Classifier Ensemble-based Engine to Mine Concept-drifting Data Streams , 2004, VLDB.

[10]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[11]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[12]  Mads Haahr,et al.  A Case-Based Approach to Spam Filtering that Can Track Concept Drift , 2003 .

[13]  Shai Ben-David,et al.  Detecting Change in Data Streams , 2004, VLDB.

[14]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[15]  João Gama,et al.  Learning with Drift Detection , 2004, SBIA.

[16]  Gerhard Widmer,et al.  Effective Learning in Dynamic Environments by Explicit Context Tracking , 1993, ECML.

[17]  Svetha Venkatesh,et al.  Using multiple windows to track concept drift , 2004, Intell. Data Anal..

[18]  Johannes Gehrke,et al.  Mining data streams under block evolution , 2002, SKDD.

[19]  Ralf Klinkenberg,et al.  Boosting classifiers for drifting concepts , 2007, Intell. Data Anal..

[20]  Philip S. Yu,et al.  Active Mining of Data Streams , 2004, SDM.

[21]  Yisheng Dong,et al.  An active learning system for mining time-changing data streams , 2007, Intell. Data Anal..

[22]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[23]  David D. Lewis,et al.  Heterogeneous Uncertainty Sampling for Supervised Learning , 1994, ICML.

[24]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[25]  Charu C. Aggarwal,et al.  Data Streams - Models and Algorithms , 2014, Advances in Database Systems.

[26]  Wei Fan,et al.  Systematic data selection to mine concept-drifting data streams , 2004, KDD.

[27]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[28]  Marcus A. Maloof,et al.  Dynamic weighted majority: a new ensemble method for tracking concept drift , 2003, Third IEEE International Conference on Data Mining.

[29]  Arnold W. M. Smeulders,et al.  Active learning using pre-clustering , 2004, ICML.

[30]  Daphne Koller,et al.  Support Vector Machine Active Learning with Application sto Text Classification , 2000, ICML.

[31]  Stefan Rüping,et al.  Incremental Learning with Support Vector Machines , 2001, ICDM.

[32]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[33]  Ralf Klinkenberg,et al.  Learning drifting concepts: Example selection vs. example weighting , 2004, Intell. Data Anal..