Enhancing Very Fast Decision Trees with Local Split-Time Predictions

An increasing number of industrial areas recognize the opportunities of Big Data, requiring highly efficient algorithms which enable real-time processing to reduce the burden of data storage and maintenance. Decision trees are extremely fast, highly accurate and easy to use in practice. Merging multiple decision trees to an ensemble leads to one of the most powerful machine learning methods. The Very Fast Decision Tree is the state-of-the-art incremental decision tree induction algorithm, capable of learning from massive data streams. It is successful due to its theoretical guarantees based on the Hoeffding bound as well as its competitive performance in terms of classification accuracy and time / space efficiency. In this paper, we increase the efficiency even further by replacing its global splitting scheme, which periodically tries to split every n_min examples. Instead, we utilize local statistics to predict the split-time, thus, avoiding unnecessary split-attempts, usually dominating the computational cost. Concretely, we use the class distributions of previous split-attempts to approximate the minimum number of examples until the Hoeffding bound is met. This cautious approach yields by design a low delay and reduces the number of split-attempts at the same time. We extensively evaluate our method using common stream-learning benchmarks also considering non-stationary environments. The experiments confirm a substantially reduced run-time without a loss in classification performance.

[1]  M. Schatz,et al.  Big Data: Astronomical or Genomical? , 2015, PLoS biology.

[2]  João Gama,et al.  Accurate decision trees for mining high-speed data streams , 2003, KDD '03.

[3]  Mohsen Guizani,et al.  Internet of Things: A Survey on Enabling Technologies, Protocols, and Applications , 2015, IEEE Communications Surveys & Tutorials.

[4]  Bernhard Pfahringer,et al.  Tie-breaking in Hoeffding trees , 2005 .

[5]  S. Canu,et al.  Training Invariant Support Vector Machines using Selective Sampling , 2005 .

[6]  Geoff Holmes,et al.  Leveraging Bagging for Evolving Data Streams , 2010, ECML/PKDD.

[7]  Sikun Li,et al.  An incremental extremely random forest classifier for online learning and tracking , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[8]  Heiko Wersing,et al.  Incremental on-line learning: A review and comparison of state of the art algorithms , 2018, Neurocomputing.

[9]  Stuart J. Russell,et al.  Experimental comparisons of online and batch versions of bagging and boosting , 2001, KDD '01.

[10]  Welch Bl THE GENERALIZATION OF ‘STUDENT'S’ PROBLEM WHEN SEVERAL DIFFERENT POPULATION VARLANCES ARE INVOLVED , 1947 .

[11]  Heiko Wersing,et al.  KNN Classifier with Self Adjusting Memory for Heterogeneous Concept Drift , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[12]  Ralph E. Spencer,et al.  The square kilometre array: The ultimate challenge for processing big data , 2013 .

[13]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[14]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[15]  Gianmarco De Francisci Morales,et al.  VHT: Vertical hoeffding tree , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[16]  B. L. Welch The generalisation of student's problems when several different population variances are involved. , 1947, Biometrika.

[17]  D. Anderson,et al.  Algorithms for minimization without derivatives , 1974 .

[18]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[19]  Geoff Holmes,et al.  Fast Perceptron Decision Tree Learning from Evolving Data Streams , 2010, PAKDD.

[20]  P. Baldi,et al.  Searching for exotic particles in high-energy physics with deep learning , 2014, Nature Communications.

[21]  Wei Fan,et al.  Extremely Fast Decision Tree Mining for Evolving Data Streams , 2017, KDD.

[22]  Simon Fong,et al.  Optimized very fast decision tree with balanced classification accuracy and compact tree size , 2011, The 3rd International Conference on Data Mining and Intelligent Information Technology Applications.

[23]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[24]  Richard Brendon Kirkby,et al.  Improving Hoeffding Trees , 2007 .

[25]  Xue Li,et al.  OcVFDT: one-class very fast decision tree for one-class classification of data streams , 2009, SensorKDD '09.

[26]  Eva García-Martín,et al.  Extraction and Energy Efficient Processing of Streaming Data , 2017 .

[27]  Ricard Gavaldà,et al.  Adaptive Learning from Evolving Data Streams , 2009, IDA.

[28]  Geoff Holmes,et al.  Efficient data stream classification via probabilistic adaptive windows , 2013, SAC '13.

[29]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[30]  Horst Bischof,et al.  On-line Random Forests , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[31]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[32]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[33]  Ron Kohavi,et al.  Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.

[34]  Geoff Holmes,et al.  Stress-Testing Hoeffding Trees , 2005, PKDD.

[35]  Denis J. Dean,et al.  Comparison of neural networks and discriminant analysis in predicting forest cover types , 1998 .

[36]  Geoff Holmes,et al.  MOA: Massive Online Analysis , 2010, J. Mach. Learn. Res..