Gradient Boosted Binary Histogram Ensemble for Large-scale Regression

In this paper, we propose a gradient boosting algorithm for large-scale regression problems called Gradient Boosted Binary Histogram Ensemble (GBBHE) based on binary histogram partition and ensemble learning. From the theoretical perspective, by assuming the Hölder continuity of the target function, we establish the statistical convergence rate of GBBHE in the space C and C, where a lower bound of the convergence rate for the base learner demonstrates the advantage of boosting. Moreover, in the space C, we prove that the number of iterations to achieve the fast convergence rate can be reduced by using ensemble regressor as the base learner, which improves the computational efficiency. In the experiments, compared with other state-of-the-art algorithms such as gradient boosted regression tree (GBRT), Breiman’s forest, and kernel-based methods, our GBBHE algorithm shows promising performance with less running time on large-scale datasets.

[1]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1990, COLT '90.

[2]  Johan A. K. Suykens,et al.  Fixed-size Least Squares Support Vector Machines: A Large Scale Application in Electrical Load Forecasting , 2006, Comput. Manag. Sci..

[3]  Sadique Sheik,et al.  Reservoir computing compensates slow response of chemosensor arrays exposed to fast varying gas concentrations in continuous monitoring , 2015 .

[4]  Zhouchen Lin,et al.  Boosted Histogram Transform for Regression , 2020, ICML.

[5]  Gérard Biau,et al.  Accelerated gradient boosting , 2018, Machine Learning.

[6]  L. Breiman SOME INFINITY THEORY FOR PREDICTOR ENSEMBLES , 2000 .

[7]  Shao-Bo Lin,et al.  Boosted Kernel Ridge Regression: Optimal Learning Rates and Early Stopping , 2019, J. Mach. Learn. Res..

[8]  Chiwoo Park,et al.  Patchwork Kriging for Large-scale Gaussian Process Regression , 2017, J. Mach. Learn. Res..

[9]  Ingo Steinwart,et al.  Spatial Decompositions for Large Scale SVMs , 2017, AISTATS.

[10]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[11]  Chiwoo Park,et al.  Efficient Computation of Gaussian Process Regression for Large Spatial Data Sets by Patching Local Gaussian Processes , 2016, J. Mach. Learn. Res..

[12]  Ingo Steinwart,et al.  Optimal Learning Rates for Localized SVMs , 2015, J. Mach. Learn. Res..

[13]  R. Shah,et al.  Least Squares Support Vector Machines , 2022 .

[14]  Paulo Cortez,et al.  A Proactive Intelligent Decision Support System for Predicting the Popularity of Online News , 2015, EPIA.

[15]  Soumendu Sundar Mukherjee,et al.  Weak convergence and empirical processes , 2019 .

[16]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[17]  R. Pace,et al.  Sparse spatial autoregressions , 1997 .

[18]  Gérard Biau,et al.  Analysis of a Random Forests Model , 2010, J. Mach. Learn. Res..

[19]  Luis M. Candanedo,et al.  Data driven prediction models of energy use of appliances in a low-energy house , 2017 .

[20]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[21]  Santiago Marco,et al.  Multivariate estimation of the limit of detection by orthogonal partial least squares in temperature-modulated MOX sensors. , 2018, Analytica chimica acta.

[22]  François Kawala,et al.  Prédictions d'activité dans les réseaux sociaux en ligne , 2013 .

[23]  François Fleuret,et al.  Adaptive sampling for large scale boosting , 2014, J. Mach. Learn. Res..

[24]  Y. K. Lee,et al.  $L_2$ boosting in kernel regression , 2009, 0909.0833.

[25]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[26]  Yu Ding,et al.  Domain Decomposition Approach for Fast Gaussian Process Regression of Large Spatial Data Sets , 2011, J. Mach. Learn. Res..

[27]  K. Hamidieh A data-driven statistical model for predicting the critical temperature of a superconductor , 2018, Computational Materials Science.

[28]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1995, COLT '90.

[29]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[30]  D. D. Lucas,et al.  Designing optimal greenhouse gas observing networks that consider performance and cost , 2014 .

[31]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[32]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[33]  Thierry Bertin-Mahieux,et al.  The Million Song Dataset , 2011, ISMIR.

[34]  Klemens Böhm,et al.  Towards Concise Models of Grid Stability , 2018, 2018 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm).

[35]  Paris Vi,et al.  Analysis of a Random Forests Model , 2010 .

[36]  P. Bühlmann,et al.  Boosting with the L2-loss: regression and classification , 2001 .

[37]  Ezequiel López-Rubio,et al.  A Histogram Transform for ProbabilityDensity Function Estimation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[39]  Piotr Fryzlewicz,et al.  Random Rotation Ensembles , 2016, J. Mach. Learn. Res..

[40]  Gilles Blanchard,et al.  On the Rate of Convergence of Regularized Boosting Classifiers , 2003, J. Mach. Learn. Res..

[41]  Juan Manuel Jiménez-Soto,et al.  Estimation of the limit of detection in semiconductor gas sensors through linearized calibration models. , 2018, Analytica chimica acta.