An Efficient Decision Tree Classification Method Based on Extended Hash Table for Data Streams Mining

This paper focuses on continuous attributes handling for mining data stream with concept drift. Data stream is an incremental, online and real time model. Domingos and Hulten have presented a one-pass algorithm. Their system VFDT use Hoeffding inequality to achieve a probabilistic bound on the accuracy of the tree constructed. VFDTpsilas extended version CVFDT handles concept drift efficiently. In this paper, we revisit this problem and implemented a system HashCVFDT on top of CVFDT. It is as fast as hash table when inserting, seeking or deleting attribute value, and it also can sort the attribute value.

[1]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[2]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[3]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[4]  U. Fayyad,et al.  On the handling of continuous-valued attributes in decision tree generation , 2004, Machine Learning.

[5]  Andrew W. Moore,et al.  Hoeffding Races: Accelerating Model Selection Search for Classification and Function Approximation , 1993, NIPS.

[6]  Jorma Rissanen,et al.  SLIQ: A Fast Scalable Classifier for Data Mining , 1996, EDBT.

[7]  Oded Maimon Knowledge Discovery and Data Mining : The Info-Fuzzy Network (IFN) Methodology , 2000 .

[8]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[9]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[10]  Peter A. Flach,et al.  Soft Discretization to Enhance the Continuous Decision Tree Induction , 2001 .

[11]  LastMark Online classification of nonstationary data streams , 2002 .

[12]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[13]  Mark Last,et al.  Online classification of nonstationary data streams , 2002, Intell. Data Anal..

[14]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[15]  Ruoming Jin,et al.  Efficient decision tree construction on streaming data , 2003, KDD '03.

[16]  João Gama,et al.  Accurate decision trees for mining high-speed data streams , 2003, KDD '03.

[17]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[18]  Philip S. Yu,et al.  On demand classification of data streams , 2004, KDD.

[19]  Wei Fan StreamMiner: A Classifier Ensemble-based Engine to Mine Concept-drifting Data Streams , 2004, VLDB.

[20]  Usama M. Fayyad,et al.  On the Handling of Continuous-Valued Attributes in Decision Tree Generation , 1992, Machine Learning.

[21]  Shonali Krishnaswamy,et al.  Mining data streams: a review , 2005, SGMD.

[22]  Zhoujun Li,et al.  An Efficient Classification System Based on Binary Search Trees for Data Streams Mining , 2007, Second International Conference on Systems (ICONS'07).

[23]  Quanyuan Wu,et al.  Mining Concept-Drifting and Noisy Data Streams Using Ensemble Classifiers , 2009, 2009 International Conference on Artificial Intelligence and Computational Intelligence.