Boosting classifiers for drifting concepts

In many real-world classification tasks, data arrives over time and the target concept to be learned from the data stream may change over time. Boosting methods are well-suited for learning from data streams, but do not address this concept drift problem. This paper proposes a boosting-like method to train a classifier ensemble from data streams that naturally adapts to concept drift. Moreover, it allows to quantify the drift in terms of its base learners. Similar as in regular boosting, examples are re-weighted to induce a diverse ensemble of base models. In order to handle drift, the proposed method continuously re-weights the ensemble members based on their performance on the most recent examples only. The proposed strategy adapts quickly to different kinds of concept drift. The algorithm is empirically shown to outperform learning algorithms that ignore concept drift. It performs no worse than advanced adaptive time window and example selection strategies that store all the data and are thus not suited for mining massive streams. The proposed algorithm has low computational costs.

[1]  Paul E. Utgoff,et al.  Incremental Induction of Decision Trees , 1989, Machine Learning.

[2]  Kenneth O. Stanley Learning Concept Drift with a Committee of Decision Trees , 2003 .

[3]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[4]  Ralf Klinkenberg Maschinelle Lernverfahren zum adaptiven Informationsfiltern bei sich verändernden Konzepten , 2000 .

[5]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[6]  Martin Scholz,et al.  Knowledge-Based Sampling for Subgroup Discovery , 2004, Local Pattern Detection.

[7]  Philip M. Long,et al.  Tracking drifting concepts using random examples , 1991, Annual Conference Computational Learning Theory.

[8]  William W. Cohen Learning Rules that Classify E-Mail , 1996 .

[9]  SaltonGerard,et al.  Term-weighting approaches in automatic text retrieval , 1988 .

[10]  Wei Fan,et al.  Systematic data selection to mine concept-drifting data streams , 2004, KDD.

[11]  Jesús S. Aguilar-Ruiz,et al.  Editorial message: special track on data streams , 2004, SAC '05.

[12]  Thorsten Joachims,et al.  Detecting Concept Drift with Support Vector Machines , 2000, ICML.

[13]  Gerhard Widmer,et al.  Learning in the presence of concept drift and hidden contexts , 2004, Machine Learning.

[14]  Claus Weihs,et al.  Clustering Techniques for the Detection of Business Cycles , 1999 .

[15]  Katharina Morik,et al.  A Multistrategy Approach to the Classification of Phases in Business Cycles , 2002, ECML.

[16]  Marko Balabanovic,et al.  An adaptive Web page recommendation service , 1997, AGENTS '97.

[17]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[18]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[19]  Martin Scholz,et al.  Sampling-based sequential subgroup mining , 2005, KDD '05.

[20]  Pat Langley,et al.  Static Versus Dynamic Sampling for Data Mining , 1996, KDD.

[21]  Ralf Klinkenberg,et al.  Predicting Phases in Business Cycles Under Concept Drift , 2003 .

[22]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[23]  Ronald L. Rivest,et al.  Learning Time-Varying Concepts , 1990, NIPS.

[24]  Johannes Fürnkranz,et al.  ROC ‘n’ Rule Learning—Towards a Better Understanding of Covering Algorithms , 2005, Machine Learning.

[25]  Tom M. Mitchell,et al.  Experience with a learning personal assistant , 1994, CACM.

[26]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[27]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[28]  Thorsten Joachims,et al.  Web Watcher: A Tour Guide for the World Wide Web , 1997, IJCAI.

[29]  Ingrid Renz,et al.  Adaptive Information Filtering: Learning in the Presence of Concept Drifts , 1998 .

[30]  Thorsten Joachims,et al.  Estimating the Generalization Performance of an SVM Efficiently , 2000, ICML.

[31]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[32]  Marcus A. Maloof,et al.  Using additive expert ensembles to cope with concept drift , 2005, ICML.

[33]  Ullrich Heilemann,et al.  Classification of west german business cycles , 1999 .

[34]  Stuart J. Russell,et al.  Online bagging and boosting , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[35]  Gholamreza Nakhaeizadeh,et al.  Learning in Dynamically Changing Domains: Theory Revision and Context Dependence Issues , 1997, ECML.

[36]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[37]  A. Karimi,et al.  Master‟s thesis , 2011 .

[38]  Stefan Rüping,et al.  Incremental Learning with Support Vector Machines , 2001, ICDM.

[39]  James Allan,et al.  Incremental relevance feedback for information filtering , 1996, SIGIR '96.

[40]  Ralf Klinkenberg,et al.  Learning drifting concepts: Example selection vs. example weighting , 2004, Intell. Data Anal..

[41]  Stefan Rüping,et al.  Concept Drift and the Importance of Example , 2003, Text Mining.

[42]  Ralf Klinkenberg,et al.  An Ensemble Classifier for Drifting Concepts , 2005 .

[43]  Philip M. Long,et al.  Tracking drifting concepts by minimizing disagreements , 2004, Machine Learning.

[44]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[45]  T. Joachims WebWatcher : A Tour Guide for the World Wide Web , 1997 .

[46]  Martin Scholz,et al.  Comparing Knowledge-Based Sampling to Boosting , 2005 .

[47]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[48]  Herbert K. H. Lee,et al.  Lossless Online Bayesian Bagging , 2004, J. Mach. Learn. Res..