Learning in Dynamic Environments: Decision Trees for Data Streams

This paper presents an adaptive learning system for induction of forest of trees from data streams able to detect Concept Drift. We have extended our previous work on Ultra Fast Forest Trees (UFFT) with the ability to detect concept drift in the distribution of the examples. The Ultra Fast Forest of Trees is an incremental algorithm, that works online, processing each example in constant time, and performing a single scan over the training examples. Our system has been designed for continuous data. It uses analytical techniques to choose the splitting criteria, and the information gain to estimate the merit of each possible splitting-test. The number of examples required to evaluate the splitting criteria is sound, based on the Hoeffding bound. For multi-class problems the algorithm builds a binary tree for each possible pair of classes, leading to a forest of trees. During the training phase the algorithm maintains a short term memory. Given a data stream, a fixed number of the most recent examples are maintained in a data-structure that supports constant time insertion and deletion. When a test is installed, a leaf is transformed into a decision node with two descendant leaves. The sufficient statistics of these leaves are initialized with the examples in the short term memory that will fall at these leaves. To detect concept drift, we maintain, at each inner node, a naive-Bayes classifier trained with the examples that traverse the node. While the distribution of the examples is stationary, the online error of naive-Bayes will decrease. When the distribution changes, the naiveBayes online error will increase. In that case the test installed at this node is not appropriate for the actual distribution of the examples. When this occurs all the subtree rooted at this node will be pruned. This methodology was tested with two artificial data sets and one real world data set. The experimental results show a good performance at the change of concept detection and also with learning the

[1]  Rajeev Motwani,et al.  Randomized Algorithms , 1995, SIGA.

[2]  Ralf Klinkenberg,et al.  Learning drifting concepts: Example selection vs. example weighting , 2004, Intell. Data Anal..

[3]  Josef Kittler,et al.  Combining classifiers: A theoretical framework , 1998, Pattern Analysis and Applications.

[4]  Thorsten Joachims,et al.  Detecting Concept Drift with Support Vector Machines , 2000, ICML.

[5]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[6]  João Gama,et al.  Accurate decision trees for mining high-speed data streams , 2003, KDD '03.

[7]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[8]  João Gama,et al.  Forest trees for on-line data , 2004, SAC '04.

[9]  Johannes Fürnkranz,et al.  Round Robin Classification , 2002, J. Mach. Learn. Res..

[10]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[11]  Gerhard Widmer,et al.  Adapting to Drift in Continuous Domains (Extended Abstract) , 1995, ECML.

[12]  M. Harries SPLICE-2 Comparative Evaluation: Electricity Pricing , 1999 .

[13]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[14]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[15]  Gerhard Widmer,et al.  Learning in the presence of concept drift and hidden contexts , 2004, Machine Learning.

[16]  W. Loh,et al.  SPLIT SELECTION METHODS FOR CLASSIFICATION TREES , 1997 .

[17]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[18]  Ingrid Renz,et al.  Adaptive Information Filtering: Learning in the Presence of Concept Drifts , 1998 .