An Algorithm for Anticipating Future Decision Trees from Concept-Drifting Data

Concept-Drift is an important topic in practical data mining, since it is reality in most business applications. Whenever a mining model is used in an application it is already outdated since the world has changed since the model induction. The solution is to predict the drift of a model and derive a future model based on such a prediction. One way would be to simulate future data and derive a model from it, but this is typically not feasible. Instead we suggest to predict the values of the measures that drive model induction. In particular, we propose to predict the future values of attribute selection measures and class label distribution for the induction of decision trees. We give an example of how concept drift is reflected in the trend of these measures and that the resulting decision trees perform considerably better than the ones produced by existing approaches.

[1]  David R. Anderson,et al.  Multimodel Inference , 2004 .

[2]  Philip M. Long,et al.  Tracking Drifting Concepts By Minimizing Disagreements , 2004, Machine Learning.

[3]  Ingrid Renz,et al.  Text Mining, Theoretical Aspects and Applications , 2002 .

[4]  Detlef D. Nauck,et al.  Towards a Framework for Change Detection in Data Sets , 2006, SGAI Conf..

[5]  Philip E. Gill,et al.  Practical optimization , 1981 .

[6]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[7]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[8]  Clifford M. Hurvich,et al.  Regression and time series model selection in small samples , 1989 .

[9]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[10]  Ronald L. Rivest,et al.  Learning Time-Varying Concepts , 1990, NIPS.

[11]  Gerhard Widmer,et al.  Learning in the Presence of Concept Drift and Hidden Contexts , 1996, Machine Learning.

[12]  Huan Liu,et al.  Handling concept drifts in incremental learning with support vector machines , 1999, KDD '99.

[13]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[14]  Wenjian Wang An Incremental Learning Strategy for Support Vector Regression , 2004, Neural Processing Letters.

[15]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[16]  Edward J. Wegman,et al.  Statistical Signal Processing , 1985 .

[17]  Ralf Klinkenberg,et al.  Learning drifting concepts: Example selection vs. example weighting , 2004, Intell. Data Anal..

[18]  Stefan Rüping,et al.  Concept Drift and the Importance of Example , 2003, Text Mining.

[19]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.