Learning from Data Stream Based on Random Projection and Hoeffding Tree Classifier

In this study, we introduce an ensemble-based approach for online machine learning. Here, instead of working on the original data, several Hoeffding tree classifiers classify and are updated on the lower dimensional projected data generated from originality by random projections. Since random projection is unstable, from one example, many diverse training data can be created to train the set of Hoeffding tree classifiers. The experiments conducted on a number of datasets chosen from different sources demonstrate that the proposed approach performs significantly better than the single Hoeffding tree and some well-known online learning algorithms including additive models and Online Bagging.

[1]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[2]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[3]  Albert Carles Bifet Figuerol,et al.  Adaptive parameter-free learning from evolving data streams , 2009 .

[4]  Matthias Weidlich,et al.  Computing Crowd Consensus with Partial Agreement , 2018, IEEE Transactions on Knowledge and Data Engineering.

[5]  Koby Crammer,et al.  Adaptive regularization of weight vectors , 2009, Machine Learning.

[6]  Harry Zhang,et al.  A Fast Decision Tree Learning Algorithm , 2006, AAAI.

[7]  Steven C. H. Hoi,et al.  Exact Soft Confidence-Weighted Learning , 2012, ICML.

[8]  João Gama,et al.  Ensemble learning for data stream analysis: A survey , 2017, Inf. Fusion.

[9]  Heikki Mannila,et al.  Random projection in dimensionality reduction: applications to image and text data , 2001, KDD '01.

[10]  Matthias Weidlich,et al.  Retaining Data from Streams of Social Platforms with Minimal Regret , 2017, IJCAI.

[11]  João Gama,et al.  Decision trees for mining data streams , 2006, Intell. Data Anal..

[12]  Carla E. Brodley,et al.  Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach , 2003, ICML.

[13]  Thomas Seidl,et al.  MOA: Massive Online Analysis, a Framework for Stream Classification and Clustering , 2010, WAPA.

[14]  Karl Aberer,et al.  Answer validation for generic crowdsourcing tasks with minimal efforts , 2017, The VLDB Journal.

[15]  Stuart J. Russell,et al.  Online bagging and boosting , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[16]  Karl Aberer,et al.  Argument discovery via crowdsourcing , 2017, The VLDB Journal.

[17]  Lior Rokach,et al.  Random Projection Ensemble Classifiers , 2009, ICEIS.

[18]  Alan Wee-Chung Liew,et al.  A novel genetic algorithm approach for simultaneous feature and classifier selection in multi classifier system , 2014, 2014 IEEE Congress on Evolutionary Computation (CEC).

[19]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[20]  Alan Wee-Chung Liew,et al.  Combining Multi Classifiers Based on a Genetic Algorithm - A Gaussian Mixture Model Framework , 2014, ICIC.

[21]  Guy Lapalme,et al.  A systematic analysis of performance measures for classification tasks , 2009, Inf. Process. Manag..

[22]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[23]  Alan Wee-Chung Liew,et al.  A novel combining classifier method based on Variational Inference , 2016, Pattern Recognit..

[24]  Alan Wee-Chung Liew,et al.  Heterogeneous classifier ensemble with fuzzy rule-based meta learner , 2018, Inf. Sci..

[25]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[26]  Ricard Gavaldà,et al.  Adaptive Learning from Evolving Data Streams , 2009, IDA.

[27]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[28]  Witold Pedrycz,et al.  Aggregation of Classifiers: A Justifiable Information Granularity Approach , 2017, IEEE Transactions on Cybernetics.

[29]  João Gama,et al.  Learning decision trees from dynamic data streams , 2005, SAC '05.

[30]  Alan Wee-Chung Liew,et al.  A Novel Online Bayes Classifier , 2016, 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA).

[31]  Roberto Avogadri,et al.  Fuzzy ensemble clustering based on random projections for DNA microarray data analysis , 2009, Artif. Intell. Medicine.

[32]  Steven C. H. Hoi,et al.  LIBOL: a library for online learning algorithms , 2014, J. Mach. Learn. Res..

[33]  Karl Aberer,et al.  An Evaluation of Aggregation Techniques in Crowdsourcing , 2013, WISE.

[34]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[35]  Ioana A. Cosma Dimension reduction of streaming data via random projections , 2009 .

[36]  Sanjay Chawla,et al.  An incremental data-stream sketch using sparse random projections , 2007, SDM.

[37]  Alan Julian Izenman,et al.  Modern Multivariate Statistical Techniques , 2008 .