Mining decision rules on data streams in the presence of concept drifts

In a database, the concept of an example might change along with time, which is known as concept drift. When the concept drift occurs, the classification model built by using the old dataset is not suitable for predicting a new dataset. Therefore, the problem of concept drift has attracted a lot of attention in recent years. Although many algorithms have been proposed to solve this problem, they have not been able to provide users with a satisfactory solution to concept drift. That is, the current research about concept drift focuses only on updating the classification model. However, real life decision makers might be very interested in the rules of concept drift. For example, doctors desire to know the root causes behind variation in the causes and development of disease. In this paper, we propose a concept drift rule mining tree, called CDR-Tree, to accurately discover the underlying rule governing concept drift. The main contributions of this paper are: (a) we address the problem of mining concept-drifting rules which has not been considered in previously developed classification schemes; (b) we develop a method that can accurately mine rules governing concept drift; (c) we develop a method that should classification models be required, can efficiently and accurately generate such models via a simple extraction procedure rather than constructing them anew; and (d) we propose two strategies to reduce the complexity of concept-drifting rules mined by our CDR-Tree.

[1]  Cheng-Jung Tsai,et al.  An Evolutionary and Attribute-Oriented Ensemble Classifier , 2006, ICCSA.

[2]  Ryszard S. Michalski,et al.  Incremental learning with partial instance memory , 2002, Artif. Intell..

[3]  Kyuseok Shim,et al.  PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning , 1998, Data Mining and Knowledge Discovery.

[4]  Paul E. Utgoff,et al.  Incremental Induction of Decision Trees , 1989, Machine Learning.

[5]  Romany F. Mansour,et al.  A robust method for partial deformed fingerprints verification using genetic algorithm , 2009, Expert Syst. Appl..

[6]  Tomasz Imielinski,et al.  An Interval Classifier for Database Mining Applications , 1992, VLDB.

[7]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[8]  Ingrid Renz,et al.  Adaptive Information Filtering: Learning in the Presence of Concept Drifts , 1998 .

[9]  Paul E. Utgoff,et al.  Decision Tree Induction Based on Efficient Tree Restructuring , 1997, Machine Learning.

[10]  Jianping Li,et al.  On the complexity of finding emerging patterns , 2004, Proceedings of the 28th Annual International Computer Software and Applications Conference, 2004. COMPSAC 2004..

[11]  Carla E. Brodley,et al.  Approaches to Online Learning and Concept Drift for User Identification in Computer Security , 1998, KDD.

[12]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[13]  Gerhard Widmer,et al.  Learning in the presence of concept drift and hidden contexts , 2004, Machine Learning.

[14]  Marcus A. Maloof,et al.  Incremental rule learning with partial instance memory for changing concepts , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[15]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[16]  Johannes Fürnkranz,et al.  Incremental Reduced Error Pruning , 1994, ICML.

[17]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[18]  Svetha Venkatesh,et al.  Using multiple windows to track concept drift , 2004, Intell. Data Anal..

[19]  Marcus A. Maloof,et al.  Dynamic weighted majority: a new ensemble method for tracking concept drift , 2003, Third IEEE International Conference on Data Mining.

[20]  Wei-Pang Yang,et al.  A Decision Tree-Based Approach to Mining the Rules of Concept Drift , 2007, Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007).

[21]  Alex Alves Freitas,et al.  Understanding the crucial differences between classification and discovery of association rules: a position paper , 2000, SKDD.

[22]  Lukasz A. Kurgan,et al.  CAIM discretization algorithm , 2004, IEEE Transactions on Knowledge and Data Engineering.

[23]  Tim Menzies,et al.  Data Mining for Very Busy People , 2003, Computer.

[24]  Wei-Pang Yang,et al.  A Top-Down and Greedy Method for Discretization of Continuous Attributes , 2007, Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007).

[25]  Ivan Koychev,et al.  Gradual Forgetting for Adaptation to Concept Drift , 2000 .

[26]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[27]  Kotagiri Ramamohanarao,et al.  Fast discovery and the generalization of strong jumping emerging patterns for building compact and accurate classifiers , 2006, IEEE Transactions on Knowledge and Data Engineering.

[28]  Ruoming Jin,et al.  Efficient decision tree construction on streaming data , 2003, KDD '03.

[29]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[30]  Peter Clark,et al.  The CN2 induction algorithm , 2004, Machine Learning.

[31]  Huan Liu,et al.  Discretization: An Enabling Technique , 2002, Data Mining and Knowledge Discovery.

[32]  Mads Haahr,et al.  A Case-Based Approach to Spam Filtering that Can Track Concept Drift , 2003 .

[33]  Ralf Klinkenberg,et al.  Using Labeled and Unlabeled Data to Learn Drifting Concepts , 2007 .

[34]  Wei Fan,et al.  Systematic data selection to mine concept-drifting data streams , 2004, KDD.

[35]  Wei-Pang Yang,et al.  A multivariate decision tree algorithm to mine imbalanced data , 2007 .