Optimal Weighted Random Forests

The random forest (RF) algorithm has become a very popular prediction method for its great flexibility and promising accuracy. In RF, it is conventional to put equal weights on all the base learners (trees) to aggregate their predictions. However, the predictive performances of different trees within the forest can be very different due to the randomization of the embedded bootstrap sampling and feature selection. In this paper, we focus on RF for regression and propose two optimal weighting algorithms, namely the 1 Step Optimal Weighted RF (1step-WRF$_\mathrm{opt}$) and 2 Steps Optimal Weighted RF (2steps-WRF$_\mathrm{opt}$), that combine the base learners through the weights determined by weight choice criteria. Under some regularity conditions, we show that these algorithms are asymptotically optimal in the sense that the resulting squared loss and risk are asymptotically identical to those of the infeasible but best possible model averaging estimator. Numerical studies conducted on real-world data sets indicate that these algorithms outperform the equal-weight forest and two other weighted RFs proposed in existing literature in most cases.

[1]  Guohua Zou,et al.  Optimal model averaging for divergent-dimensional Poisson regressions , 2022, Econometric Reviews.

[2]  Jinyao Lin,et al.  Analyzing the impact of three-dimensional building structure on CO2 emissions based on random forest regression , 2021 .

[3]  K. Kaliyaperumal,et al.  Applications of artificial intelligence in business management, e-commerce and finance , 2021, Materials Today: Proceedings.

[4]  Yuhong Yang,et al.  On improvability of model selection by model averaging , 2021 .

[5]  Jaehyun Yoon Forecasting of Real GDP Growth Using Machine Learning Models: Gradient Boosting and Random Forest Approach , 2020, Computational Economics.

[6]  Ariel Neufeld,et al.  Forecasting directional movements of stock prices for intraday trading using LSTM and random forests , 2020, Finance Research Letters.

[7]  Xinyu Zhang A NEW STUDY ON ASYMPTOTIC OPTIMALITY OF LEAST SQUARES MODEL AVERAGING , 2020, Econometric Theory.

[8]  R. Carroll,et al.  Parsimonious Model Averaging With a Diverging Number of Parameters , 2020, Journal of the American Statistical Association.

[9]  Hieu Pham,et al.  On Cesáro Averages for Weighted Trees in the Random Forest , 2019, Journal of Classification.

[10]  Shouyang Wang,et al.  Frequentist model averaging for threshold models , 2019 .

[11]  Guanjun Liu,et al.  Refined Weighted Random Forest and Its Application to Credit Card Fraud Detection , 2018, CSoNet.

[12]  Xinyu Zhang,et al.  Model averaging with averaging covariance matrix , 2016 .

[13]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[14]  Erwan Scornet,et al.  A random forest guided tour , 2015, TEST.

[15]  Susan Athey,et al.  Recursive partitioning for heterogeneous causal effects , 2015, Proceedings of the National Academy of Sciences.

[16]  Robert R. Freimuth,et al.  A weighted random forests approach to improve predictive performance , 2013, Stat. Anal. Data Min..

[17]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[18]  Hongwei Ding,et al.  Trees Weighting Random Forest Method for Classifying High-Dimensional Noisy Data , 2010, 2010 IEEE 7th International Conference on E-Business Engineering.

[19]  Gérard Biau,et al.  Analysis of a Random Forests Model , 2010, J. Mach. Learn. Res..

[20]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[21]  Robert P. Sheridan,et al.  Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[22]  L. Breiman Random Forests , 2001, Encyclopedia of Machine Learning and Data Mining.

[23]  I. Rhodes,et al.  A matrix inequality associated with bounds on solutions of algebraic Riccati and Lyapunov equations , 1987 .

[24]  Paris Vi,et al.  Analysis of a Random Forests Model , 2010 .