Random Projection Random Discretization Ensembles—Ensembles of Linear Multivariate Decision Trees

In this paper, we present a novel ensemble method random projection random discretization ensembles(RPRDE) to create ensembles of linear multivariate decision trees by using a univariate decision tree algorithm. The present method combines the better computational complexity of a univariate decision tree algorithm with the better representational power of linear multivariate decision trees. We develop random discretization (RD) method that creates random discretized features from continuous features. Random projection (RP) is used to create new features that are linear combinations of original features. A new dataset is created by augmenting discretized features (created by using RD) with features created by using RP. Each decision tree of a RPRD ensemble is trained on one dataset from the pool of these datasets by using a univariate decision tree algorithm. As these multivariate decision trees (because of features created by RP) have more representational power than univariate decision trees, we expect accurate decision trees in the ensemble. Diverse training datasets ensure diverse decision trees in the ensemble. We study the performance of RPRDE against other popular ensemble techniques using C4.5 tree as the base classifier. RPRDE matches or outperforms other popular ensemble methods. Experiments results also suggest that the proposed method is quite robust to the class noise.

[1]  Ibrahim A. Albidewi,et al.  Consistency of randomized and finite sized decision tree ensembles , 2011, Pattern Analysis and Applications.

[2]  Oleksandr Makeyev,et al.  Neural network with ensembles , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[3]  Simon Kasif,et al.  OC1: A Randomized Induction of Oblique Decision Trees , 1993, AAAI.

[4]  Simon Kasif,et al.  Induction of Oblique Decision Trees , 1993, IJCAI.

[5]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[6]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[7]  Ethem Alpaydın,et al.  Combined 5 x 2 cv F Test for Comparing Supervised Classification Learning Algorithms , 1999, Neural Comput..

[8]  Santosh S. Vempala,et al.  Kernels as features: On kernels, margins, and low-dimensional mappings , 2006, Machine Learning.

[9]  Kagan Tumer,et al.  Error Correlation and Error Reduction in Ensemble Classifiers , 1996, Connect. Sci..

[10]  Philip S. Yu,et al.  Is random model better? On its accuracy and efficiency , 2003, Third IEEE International Conference on Data Mining.

[11]  Carla E. Brodley,et al.  Multivariate decision trees , 2004, Machine Learning.

[12]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Vijay S. Iyengar HOT: heuristics for oblique trees , 1999, Proceedings 11th International Conference on Tools with Artificial Intelligence.

[15]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[16]  Dmitriy Fradkin,et al.  Experiments with random projections for machine learning , 2003, KDD '03.

[17]  Dimitris Achlioptas,et al.  Database-friendly random projections , 2001, PODS.

[18]  Lawrence O. Hall,et al.  A Comparison of Ensemble Creation Techniques , 2004, Multiple Classifier Systems.

[19]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[20]  Maria-Florina Balcan,et al.  On a theory of learning with similarity functions , 2006, ICML.

[21]  Santosh S. Vempala,et al.  An algorithmic theory of learning: Robust concepts and random projection , 1999, Machine Learning.

[22]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[23]  Carla E. Brodley,et al.  Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach , 2003, ICML.

[24]  Sanjoy Dasgupta,et al.  Learning mixtures of Gaussians , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[25]  Thomas G. Dietterich,et al.  Pruning Adaptive Boosting , 1997, ICML.

[26]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[27]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[28]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[29]  Lior Rokach,et al.  Random Projection Ensemble Classifiers , 2009, ICEIS.

[30]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[31]  Geoffrey I. Webb,et al.  MultiBoosting: A Technique for Combining Boosting and Wagging , 2000, Machine Learning.

[32]  Avrim Blum,et al.  Techniques for exploiting unlabeled data , 2008 .

[33]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[34]  Philip S. Yu,et al.  A general framework for accurate and fast regression by data summarization in random decision trees , 2006, KDD '06.

[35]  Shinichi Morishita,et al.  On Classification and Regression , 1998, Discovery Science.

[36]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[37]  Luc Devroye,et al.  Consistency of Random Forests and Other Averaging Classifiers , 2008, J. Mach. Learn. Res..

[38]  Anupam Gupta,et al.  An elementary proof of the Johnson-Lindenstrauss Lemma , 1999 .