Class Specific Fuzzy Decision Trees for Mining High Speed Data Streams

In recent years, classification learning for data streams has become an important and active research topic. A major challenge posed by data streams is that their underlying concepts can change over time, which requires current classifiers to be revised accordingly and timely. To detect concept change, a common method is to observe the online classification accuracy. If accuracy drops below some threshold value, a concept change is deemed to have taken place. An implicit assumption behind this methodology is that any drop in accuracy can be interpreted as a symptom of concept change. Unfortunately however, this assumption is often violated in the real world where data streams carry noise and missing values that can also introduce a significant reduction in classification accuracy. To compound this problem, traditional noise cleansing methods are not applicable to data streams. These methods normally need to scan data multiple times whereas learning in data streams can only afford one-pass scan because of data's high speed and huge volume. To solve these problems, this paper proposes a novel classification algorithm, Class Specific Fuzzy Decision Trees (CSFDT), which utilizes fuzzy logic to classify data streams. The base classifier of CSFDT is a binary fuzzy decision tree. Whenever the problem of concern contains q classes (q > 2), CSFDT learns one binary classifier for each class to distinguish instances of this class from instances of the remaining (q − 1) classes. The CSFDT's advantages are three folds. First, it offers an adaptive structure to effectively and efficiently handle concept change. Second, it is robust to noise. Third, it deals with missing values in an elegant way. As a result, accuracy drop can be safely attributed to concept change. Extensive evaluations are conducted to compare CSFDT with representative existing data stream classification algorithms on a large variety of data. Experimental results suggest that CSFDT provides a significant benefit to data stream classification in real-world scenarios where concept changes, noise and missing values coexist.

[1]  Jayanta Basak,et al.  Online Adaptive Decision Trees: Pattern Classification and Function Approximation , 2006, Neural Computation.

[2]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[3]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[4]  C.Z. Janikow,et al.  Fuzzy decision tree FID , 2005, NAFIPS 2005 - 2005 Annual Meeting of the North American Fuzzy Information Processing Society.

[5]  I. Hatono,et al.  Fuzzy decision trees by fuzzy ID3 algorithm and its application to diagnosis systems , 1994, Proceedings of 1994 IEEE 3rd International Fuzzy Systems Conference.

[6]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[7]  Sattar Hashemi,et al.  To Better Handle Concept Change and Noise: A Cellular Automata Approach to Data Stream Classification , 2007, Australian Conference on Artificial Intelligence.

[8]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[9]  Alexey Tsymbal,et al.  The problem of concept drift: definitions and related work , 2004 .

[10]  Xindong Wu,et al.  Mining in Anticipation for Concept Change: Proactive-Reactive Prediction in Data Streams , 2006, Data Mining and Knowledge Discovery.

[11]  Johannes Fürnkranz,et al.  Round Robin Classification , 2002, J. Mach. Learn. Res..

[12]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[13]  Louis Wehenkel,et al.  A complete fuzzy decision tree technique , 2003, Fuzzy Sets Syst..

[14]  P.E. Maher,et al.  Uncertain reasoning in an ID3 machine learning framework , 1993, [Proceedings 1993] Second IEEE International Conference on Fuzzy Systems.

[15]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[16]  Marcus A. Maloof,et al.  Dynamic weighted majority: a new ensemble method for tracking concept drift , 2003, Third IEEE International Conference on Data Mining.

[17]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[18]  Xindong Wu,et al.  Effective classification of noisy data streams with attribute-oriented dynamic classifier selection , 2006, Knowledge and Information Systems.

[19]  Xindong Wu,et al.  Eliminating Class Noise in Large Datasets , 2003, ICML.

[20]  Xingquan Zhu,et al.  Class Noise vs. Attribute Noise: A Quantitative Study , 2003, Artificial Intelligence Review.

[22]  Sankar K. Pal,et al.  Fuzzy decision tree, linguistic rules and fuzzy knowledge-based network: generation and evaluation , 2002, IEEE Trans. Syst. Man Cybern. Part C.

[23]  Haixun Wang,et al.  On reducing classifier granularity in mining concept-drifting data streams , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[24]  Rajen B. Bhatt,et al.  Neuro-fuzzy Decision Trees , 2006, Int. J. Neural Syst..

[25]  Cezary Z. Janikow,et al.  Fuzzy decision trees: issues and methods , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[26]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[27]  Xindong Wu,et al.  Combining proactive and reactive predictions for data streams , 2005, KDD '05.

[28]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[29]  Shen-Shyang Ho,et al.  A martingale framework for concept change detection in time-varying data streams , 2005, ICML.

[30]  Gerhard Widmer,et al.  Learning in the Presence of Concept Drift and Hidden Contexts , 1996, Machine Learning.

[31]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[32]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.