Recurrent concepts in data streams classification

This work addresses the problem of mining data streams generated in dynamic environments where the distribution underlying the observations may change over time. We present a system that monitors the evolution of the learning process. The system is able to self-diagnose degradations of this process, using change detection mechanisms, and self-repair the decision models. The system uses meta-learning techniques that characterize the domain of applicability of previously learned models. The meta-learner can detect recurrence of contexts, using unlabeled examples, and take pro-active actions by activating previously learned models. The experimental evaluation on three text mining problems demonstrates the main advantages of the proposed system: it provides information about the recurrence of concepts and rapidly adapts decision models when drift occurs.

[1]  Ralf Klinkenberg,et al.  Learning drifting concepts: Example selection vs. example weighting , 2004, Intell. Data Anal..

[2]  Amit Mitra,et al.  Statistical Quality Control , 2002, Technometrics.

[3]  Raj Bhatnagar,et al.  Tracking recurrent concept drift in streaming data using ensemble classifiers , 2007, Sixth International Conference on Machine Learning and Applications (ICMLA 2007).

[4]  Frank Kirchner,et al.  Performance evaluation of EANT in the robocup keepaway benchmark , 2007, ICMLA 2007.

[5]  Mihai Lazarescu,et al.  A Multi-Resolution Learning Approach to Tracking Concept Drift and Recurrent Concepts , 2005, PRIS.

[6]  Vipin Kumar,et al.  Chapman & Hall/CRC Data Mining and Knowledge Discovery Series , 2008 .

[7]  Ricard Gavaldà,et al.  Learning from Time-Changing Data with Adaptive Windowing , 2007, SDM.

[8]  João Gama,et al.  Learning with Drift Detection , 2004, SBIA.

[9]  Gerhard Widmer,et al.  Learning in the Presence of Concept Drift and Hidden Contexts , 1996, Machine Learning.

[10]  Johannes Fürnkranz,et al.  An Evaluation of Grading Classifiers , 2001, IDA.

[11]  Shlomo Argamon,et al.  Arbitrating Among Competing Classifiers Using Learned Referees , 2001, Knowledge and Information Systems.

[12]  A. Bifet,et al.  Early Drift Detection Method , 2005 .

[13]  Gerhard Widmer,et al.  Tracking Context Changes through Meta-Learning , 1997, Machine Learning.

[14]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[15]  H. Jose Exploiting Multiple Existing Models and Learning Algorithms , 1995 .

[16]  João Gama,et al.  On evaluating stream learning algorithms , 2012, Machine Learning.

[17]  Xindong Wu,et al.  Mining in Anticipation for Concept Change: Proactive-Reactive Prediction in Data Streams , 2006, Data Mining and Knowledge Discovery.

[18]  Claude Sammut,et al.  Extracting Hidden Context , 1998, Machine Learning.

[19]  Edsger W. Dijkstra,et al.  Self-stabilizing systems in spite of distributed control , 1974, CACM.

[20]  Andreas S. Rath,et al.  Analysis of machine learning techniques for context extraction , 2008, 2008 Third International Conference on Digital Information Management.

[21]  Grigorios Tsoumakas,et al.  An adaptive personalized news dissemination system , 2009, Journal of Intelligent Information Systems.

[22]  Jesús S. Aguilar-Ruiz,et al.  Knowledge discovery from data streams , 2009, Intell. Data Anal..

[23]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[24]  Grigorios Tsoumakas,et al.  Tracking recurring contexts using ensemble classifiers: an application to email filtering , 2009, Knowledge and Information Systems.

[25]  Peter D. Turney The Management of Context-Sensitive Features: A Review of Strategies , 2002, ArXiv.