A comparative analysis of gradient boosting algorithms

The family of gradient boosting algorithms has been recently extended with several interesting proposals (i.e. XGBoost, LightGBM and CatBoost) that focus on both speed and accuracy. XGBoost is a scalable ensemble technique that has demonstrated to be a reliable and efficient machine learning challenge solver. LightGBM is an accurate model focused on providing extremely fast training performance using selective sampling of high gradient instances. CatBoost modifies the computation of gradients to avoid the prediction shift in order to improve the accuracy of the model. This work proposes a practical analysis of how these novel variants of gradient boosting work in terms of training speed, generalization performance and hyper-parameter setup. In addition, a comprehensive comparison between XGBoost, LightGBM, CatBoost, random forests and gradient boosting has been performed using carefully tuned models as well as using their default settings. The results of this comparison indicate that CatBoost obtains the best results in generalization accuracy and AUC in the studied datasets although the differences are small. LightGBM is the fastest of all methods but not the most accurate. Finally, XGBoost places second both in accuracy and in training speed. Finally an extensive analysis of the effect of hyper-parameter tuning in XGBoost, LightGBM and CatBoost is carried out using two novel proposed tools.

[1]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[2]  Yufei Xia,et al.  A boosted decision tree approach using Bayesian hyper-parameter optimization for credit scoring , 2017, Expert Syst. Appl..

[3]  Álvaro Alonso,et al.  Regression tree ensembles for wind energy and solar radiation prediction , 2017, Neurocomputing.

[4]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[5]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[6]  Toniann Pitassi,et al.  Generalization in Adaptive Data Analysis and Holdout Reuse , 2015, NIPS.

[7]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[8]  Francisco Herrera,et al.  Consensus vote models for detecting and filtering neutrality in sentiment analysis , 2018, Inf. Fusion.

[9]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[10]  Lior Rokach,et al.  Decision forest: Twenty years of research , 2016, Inf. Fusion.

[11]  D. Thompson,et al.  3FGL DEMOGRAPHICS OUTSIDE THE GALACTIC PLANE USING SUPERVISED MACHINE LEARNING: PULSAR AND DARK MATTER SUBHALO INTERPRETATIONS , 2016, 1605.00711.

[12]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[13]  M. S. Kiran,et al.  Crude oil price forecasting using XGBoost , 2017, 2017 International Conference on Computer Science and Engineering (UBMK).

[14]  Senén Barro,et al.  Do we need hundreds of classifiers to solve real world classification problems? , 2014, J. Mach. Learn. Res..

[15]  Darshak M Sanghavi,et al.  Machine learning models to predict onset of dementia: A label learning approach , 2019, Alzheimer's & dementia.

[16]  Konrad Kuijken,et al.  KiDS-SQuaD , 2019, Astronomy & Astrophysics.

[17]  J. Friedman Stochastic gradient boosting , 2002 .

[18]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[19]  Anna Veronika Dorogush,et al.  CatBoost: unbiased boosting with categorical features , 2017, NeurIPS.

[20]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[21]  Christophe Mues,et al.  An experimental comparison of classification algorithms for imbalanced credit scoring data sets , 2012, Expert Syst. Appl..

[22]  Yoav Freund,et al.  A Short Introduction to Boosting , 1999 .

[23]  Xiangliang Zhang,et al.  An up-to-date comparison of state-of-the-art classification algorithms , 2017, Expert Syst. Appl..

[24]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[25]  Faisal Saeed,et al.  Bioactive Molecule Prediction Using Extreme Gradient Boosting , 2016, Molecules.