Sparse Projection Oblique Randomer Forests

Decision forests, including Random Forests and Gradient Boosting Trees, have recently demonstrated state-of-the-art performance in a variety of machine learning settings. Decision forests are typically ensembles of axis-aligned decision trees; that is, trees that split only along feature dimensions. In contrast, many recent extensions to decision forests are based on axis-oblique splits. Unfortunately, these extensions forfeit one or more of the favorable properties of decision forests based on axis-aligned splits, such as robustness to many noise dimensions, interpretability, or computational efficiency. We introduce yet another decision forest, called "Sparse Projection Oblique Randomer Forests" (SPORF). SPORF uses very sparse random projections, i.e., linear combinations of a small subset of features. SPORF significantly improves accuracy over existing state-of-the-art algorithms on a standard benchmark suite for classification with >100 problems of varying dimension, sample size, and number of classes. To illustrate how SPORF addresses the limitations of both axis-aligned and existing oblique decision forest methods, we conduct extensive simulated experiments. SPORF typically yields improved performance over existing decision forests, while mitigating computational efficiency and scalability and maintaining interpretability. SPORF can easily be incorporated into other ensemble methods such as boosting to obtain potentially similar gains.

[1]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[2]  Piotr Fryzlewicz,et al.  Random Rotation Ensembles , 2016, J. Mach. Learn. Res..

[3]  Erwan Scornet,et al.  Neural Random Forests , 2016, Sankhya A.

[4]  Ullrich Köthe,et al.  On Oblique Random Forests , 2011, ECML/PKDD.

[5]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[6]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[7]  L. Breiman Arcing Classifiers , 1998 .

[8]  Moritz Helmstaedter,et al.  Big data in nanoscale connectomics, and the greed for training labels , 2019, Current Opinion in Neurobiology.

[9]  Gareth James,et al.  Variance and Bias for General Loss Functions , 2003, Machine Learning.

[10]  David Mease,et al.  Explaining the Success of AdaBoost and Random Forests as Interpolating Classifiers , 2015, J. Mach. Learn. Res..

[11]  Rich Caruana,et al.  An empirical evaluation of supervised learning in high dimensions , 2008, ICML '08.

[12]  Bernd Bischl,et al.  Tunability: Importance of Hyperparameters of Machine Learning Algorithms , 2018, J. Mach. Learn. Res..

[13]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[14]  Dimitris Achlioptas,et al.  Database-friendly random projections: Johnson-Lindenstrauss with binary coins , 2003, J. Comput. Syst. Sci..

[15]  Aaron Klein,et al.  BOHB: Robust and Efficient Hyperparameter Optimization at Scale , 2018, ICML.

[16]  Gilles Louppe,et al.  Understanding Random Forests: From Theory to Practice , 2014, 1407.7502.

[17]  Carla E. Brodley,et al.  Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach , 2003, ICML.

[18]  Randal C. Burns,et al.  Forest Packing: Fast, Parallel Decision Forests , 2018, SDM.

[19]  Dmitriy Fradkin,et al.  Experiments with random projections for machine learning , 2003, KDD '03.

[20]  Kenneth Ward Church,et al.  Very sparse random projections , 2006, KDD '06.

[21]  Gerard V. Trunk,et al.  A Problem of Dimensionality: A Simple Example , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Sanjoy Dasgupta,et al.  Random projection trees and low dimensional manifolds , 2008, STOC.

[23]  Roman Vershynin,et al.  High-Dimensional Probability , 2018 .

[24]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[25]  Jean-Philippe Vert,et al.  Consistency of Random Forests , 2014, 1405.2881.

[26]  Chinmay Hegde,et al.  Random Projections for Manifold Learning , 2007, NIPS.

[27]  O. Papaspiliopoulos High-Dimensional Probability: An Introduction with Applications in Data Science , 2020 .

[28]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[30]  Dirk Eddelbuettel,et al.  Rcpp: Seamless R and C++ Integration , 2011 .

[31]  Sanjoy Dasgupta,et al.  Randomized partition trees for exact nearest neighbor search , 2013, COLT.

[32]  Joshua T. Vogelstein,et al.  ROFLMAO: Robust Oblique Forests with Linear MAtrix Operations , 2017, SDM.

[33]  Luc Devroye,et al.  Consistency of Random Forests and Other Averaging Classifiers , 2008, J. Mach. Learn. Res..

[34]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[35]  Donghoon Lee,et al.  Fast and Accurate Head Pose Estimation via Random Projection Forests , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[36]  Sanjoy Dasgupta,et al.  Random projection trees for vector quantization , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.

[37]  Simon Kasif,et al.  Induction of Oblique Decision Trees , 1993, IJCAI.

[38]  Senén Barro,et al.  Do we need hundreds of classifiers to solve real world classification problems? , 2014, J. Mach. Learn. Res..

[39]  Heikki Mannila,et al.  Random projection in dimensionality reduction: applications to image and text data , 2001, KDD '01.

[40]  L. Breiman Arcing classifier (with discussion and a rejoinder by the author) , 1998 .

[41]  Frank D. Wood,et al.  Canonical Correlation Forests , 2015, ArXiv.

[42]  Andreas Ziegler,et al.  ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R , 2015, 1508.04409.