A XGBoost risk model via feature selection and Bayesian hyper-parameter optimization

This paper aims to explore models based on the extreme gradient boosting (XGBoost) approach for business risk classification. Feature selection (FS) algorithms and hyper-parameter optimizations are simultaneously considered during model training. The five most commonly used FS methods including weight by Gini, weight by Chi-square, hierarchical variable clustering, weight by correlation, and weight by information are applied to alleviate the effect of redundant features. Two hyper-parameter optimization approaches, random search (RS) and Bayesian tree-structured Parzen Estimator (TPE), are applied in XGBoost. The effect of different FS and hyper-parameter optimization methods on the model performance are investigated by the Wilcoxon Signed Rank Test. The performance of XGBoost is compared to the traditionally utilized logistic regression (LR) model in terms of classification accuracy, area under the curve (AUC), recall, and F1 score obtained from the 10-fold cross validation. Results show that hierarchical clustering is the optimal FS method for LR while weight by Chi-square achieves the best performance in XG-Boost. Both TPE and RS optimization in XGBoost outperform LR significantly. TPE optimization shows a superiority over RS since it results in a significantly higher accuracy and a marginally higher AUC, recall and F1 score. Furthermore, XGBoost with TPE tuning shows a lower variability than the RS method. Finally, the ranking of feature importance based on XGBoost enhances the model interpretation. Therefore, XGBoost with Bayesian TPE hyper-parameter optimization serves as an operative while powerful approach for business risk modeling.

[1]  J Elith,et al.  A working guide to boosted regression trees. , 2008, The Journal of animal ecology.

[2]  Yan Wang,et al.  Binary Classification on Past Due of Service Accounts using Logistic Regression and Decision Tree , 2017 .

[3]  Robert A. McLean,et al.  Credit Risk Measurement: Developments over the Last 20 Years , 1998 .

[4]  Jakub M. Tomczak,et al.  Ensemble boosted trees with synthetic features generation in application to bankruptcy prediction , 2016, Expert Syst. Appl..

[5]  David D. Cox,et al.  Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures , 2013, ICML.

[6]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[7]  Niklaus E. Zimmermann,et al.  Predicting tree species presence and basal area in Utah: A comparison of stochastic gradient boosting, generalized additive models, and tree-based methods , 2006 .

[8]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[9]  Jian Ma,et al.  Two credit scoring models based on dual strategy ensemble trees , 2012, Knowl. Based Syst..

[10]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[11]  Yufei Xia,et al.  A boosted decision tree approach using Bayesian hyper-parameter optimization for credit scoring , 2017, Expert Syst. Appl..

[12]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[13]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[14]  Christophe Mues,et al.  An experimental comparison of classification algorithms for imbalanced credit scoring data sets , 2012, Expert Syst. Appl..

[15]  Ralph A. Walkling,et al.  Predicting Tender Offer Success: A Logistic Analysis , 1985, Journal of Financial and Quantitative Analysis.

[16]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[17]  J. Suykens,et al.  Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research , 2015, Eur. J. Oper. Res..

[18]  Chris Eliasmith,et al.  Hyperopt: a Python library for model selection and hyperparameter optimization , 2015 .

[19]  Stephen V. Stehman,et al.  Selecting and interpreting measures of thematic classification accuracy , 1997 .

[20]  Yan Wang,et al.  A Two-Stage Hybrid Model by Using Artificial Neural Networks As Feature Construction Algorithms , 2018, International Journal of Data Mining & Knowledge Management Process.

[21]  David M. W. Powers,et al.  Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation , 2011, ArXiv.

[22]  Hedieh Sajedi,et al.  A hybrid data mining model of feature selection algorithms and ensemble learning classifiers for credit scoring , 2015 .

[23]  Steven Finlay,et al.  Multiple classifier architectures and their application to credit risk assessment , 2011, Eur. J. Oper. Res..

[24]  David D. Cox,et al.  Hyperopt: A Python Library for Optimizing the Hyperparameters of Machine Learning Algorithms , 2013, SciPy.

[25]  Selwyn Piramuthu,et al.  Artificial Intelligence and Information Technology Evaluating feature selection methods for learning in data mining applications , 2004 .

[26]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[27]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[28]  Santiago Beguería,et al.  Validation and Evaluation of Predictive Models in Hazard Assessment and Risk Management , 2006 .

[29]  Liyuan Liu,et al.  Deep learning approach for cyberattack detection , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[30]  Kevin Leyton-Brown,et al.  Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms , 2012, KDD.

[31]  M. Napierala What Is the Bonferroni Correction ? , 2014 .

[32]  Gianluca Antonini,et al.  Subagging for credit scoring models , 2010, Eur. J. Oper. Res..

[33]  Gretchen G. Moisen,et al.  A comparison of the performance of threshold criteria for binary classification in terms of predicted prevalence and Kappa , 2008 .