Improved Boosted Regression Forests Through Non-Greedy Tree Optimization

Regression forests (ensembles of trees) are considered as the leading off-the-shelf method for regression. One of the main approaches of constructing such forests is based on boosting. However, majority of the current boosting implementations employ an axis-aligned tree as a base learner, where each decision node tests for a single feature. Moreover, such trees are usually trained by greedy top-down algorithms such as CART which is shown to be suboptimal. We instead use oblique trees, where each decision node tests for a linear combination of features and train them with the recently proposed non-greedy tree learning method-Tree Alternating Optimization (TAO). We embed the TAO algorithm into the boosting framework and show its effectiveness in the regression setting. We show that it produces much better forests than other types of tree ensembling methods in terms of error, model size and inference time. The result has an immense practical impact on various applications such as in signal processing, data mining, computer vision, etc.

[1]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[2]  Miguel Á. Carreira-Perpiñán,et al.  Ensembles of Bagged TAO Trees Consistently Improve over Random Forests, AdaBoost and Gradient Boosting , 2020, FODS.

[3]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[4]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[5]  Lei Wang,et al.  AdaBoost with SVM-based component classifiers , 2008, Eng. Appl. Artif. Intell..

[6]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[7]  David P. Helmbold,et al.  Boosting Methods for Regression , 2002, Machine Learning.

[8]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[9]  Horst Bischof,et al.  Alternating Regression Forests for Object Detection and Pose Estimation , 2013, 2013 IEEE International Conference on Computer Vision.

[10]  Anna Veronika Dorogush,et al.  CatBoost: unbiased boosting with categorical features , 2017, NeurIPS.

[11]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[12]  Ethem Alpaydin,et al.  Soft decision trees , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[13]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[14]  Leo Breiman,et al.  Bias, Variance , And Arcing Classifiers , 1996 .

[15]  Harris Drucker,et al.  Improving Regressors using Boosting Techniques , 1997, ICML.

[16]  Antanas Verikas,et al.  Mining data with random forests: A survey and results of new tests , 2011, Pattern Recognit..

[17]  Frank Nielsen,et al.  Real Boosting a la Carte with an Application to Boosting Oblique Decision Tree , 2007, IJCAI.

[18]  Yoshua Bengio,et al.  Boosting Neural Networks , 2000, Neural Computation.

[19]  Jian Sun,et al.  Global refinement of random forest , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Trevor Hastie,et al.  Statistical Learning with Sparsity: The Lasso and Generalizations , 2015 .

[21]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[22]  David B. Skillicorn,et al.  Parallelizing Boosting and Bagging , 2001 .

[23]  Miguel 'A. Carreira-Perpin'an,et al.  An Experimental Comparison of Old and New Decision Tree Algorithms , 2019, ArXiv.

[24]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[25]  Durga L. Shrestha,et al.  Experiments with AdaBoost.RT, an Improved Boosting Scheme for Regression , 2006, Neural Computation.

[26]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[27]  Miguel Á. Carreira-Perpiñán,et al.  Alternating optimization of decision trees, with application to learning sparse oblique trees , 2018, NeurIPS.

[28]  Nathan Intrator,et al.  Boosting Regression Estimators , 1999, Neural Computation.

[29]  Miguel Á. Carreira-Perpiñán,et al.  Smaller, more accurate regression forests using tree alternating optimization , 2020, ICML.

[30]  Pierre Geurts,et al.  Globally Induced Forest: A Prepruning Compression Scheme , 2017, ICML.

[31]  Antonio Criminisi,et al.  Decision Forests: A Unified Framework for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning , 2012, Found. Trends Comput. Graph. Vis..

[32]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[33]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[34]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[35]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[36]  Misha Denil,et al.  Narrowing the Gap: Random Forests In Theory and In Practice , 2013, ICML.