Weighted Quantile Regression Forests for Bimodal Distribution Modeling: A Loss Given Default Case

Due to various regulations (e.g., the Basel III Accord), banks need to keep a specified amount of capital to reduce the impact of their insolvency. This equity can be calculated using, e.g., the Internal Rating Approach, enabling institutions to develop their own statistical models. In this regard, one of the most important parameters is the loss given default, whose correct estimation may lead to a healthier and riskless allocation of the capital. Unfortunately, since the loss given default distribution is a bimodal application of the modeling methods (e.g., ordinary least squares or regression trees), aiming at predicting the mean value is not enough. Bimodality means that a distribution has two modes and has a large proportion of observations with large distances from the middle of the distribution; therefore, to overcome this fact, more advanced methods are required. To this end, to model the entire loss given default distribution, in this article we present the weighted quantile Regression Forest algorithm, which is an ensemble technique. We evaluate our methodology over a dataset collected by one of the biggest Polish banks. Through our research, we show that weighted quantile Regression Forests outperform “single” state-of-the-art models in terms of their accuracy and the stability.

[1]  Peter Dalgaard,et al.  R Development Core Team (2010): R: A language and environment for statistical computing , 2010 .

[2]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[3]  Marlon Williams Bank overdraft pricing and myopic consumers , 2016 .

[4]  Nicolai Meinshausen,et al.  Quantile Regression Forests , 2006, J. Mach. Learn. Res..

[5]  J. Behboodian On the Modes of a Mixture of Two Normal Distributions , 1970 .

[6]  Shigeyuki Hamori,et al.  Random forests-based early warning system for bank failures , 2016 .

[7]  S. Manikandan,et al.  Measures of dispersion , 2011, Journal of pharmacology & pharmacotherapeutics.

[8]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[9]  Krzysztof Gajowniczek,et al.  Do Customers Choose Proper Tariff? Empirical Analysis Based on Polish Data Using Unsupervised Techniques , 2018 .

[10]  João A. Bastos Forecasting bank loans loss-given-default , 2010 .

[11]  Gökhan BakIr,et al.  Predicting Structured Data , 2008 .

[12]  A. Hamerle,et al.  Modelling Loss Given Default: A “Point in Time”-Approach , 2011 .

[13]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[14]  Dimitrios I. Fotiadis,et al.  Dynamic construction of Random Forests: Evaluation using biomedical engineering problems , 2010, Proceedings of the 10th IEEE International Conference on Information Technology and Applications in Biomedicine.

[15]  Leo Breiman,et al.  Using Iterated Bagging to Debias Regressions , 2001, Machine Learning.

[16]  M. Qi,et al.  Comparison of modeling methods for Loss Given Default , 2011 .

[17]  Tsung I. Lin,et al.  Maximum likelihood estimation for multivariate skew normal mixture models , 2009, J. Multivar. Anal..

[18]  Krzysztof Gajowniczek,et al.  Grade analysis for energy usage patterns segmentation based on smart meter data , 2015, 2015 IEEE 2nd International Conference on Cybernetics (CYBCONF).

[19]  Marie Chavent,et al.  Combining clustering of variables and feature selection using random forests , 2016, Commun. Stat. Simul. Comput..

[20]  Thomas Hartmann-Wendels,et al.  Loss given default for leasing: Parametric and nonparametric estimations , 2014 .

[21]  Christophe Hurlin,et al.  Loss functions for Loss Given Default model comparison , 2018, Eur. J. Oper. Res..

[22]  Raffaella Calabrese,et al.  Estimating bank loans loss given default by generalized additive models , 2012 .

[23]  J. Dermine,et al.  Bank Loan Losses-Given-Default, a Case Study , 2006 .

[24]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[25]  J. Hinloopen,et al.  Comparing Distributions: The Harmonic Mass Index , 2005 .

[26]  Karen A. F. Copeland,et al.  A Casebook for a First Course in Statistics and Data Analysis , 1996 .

[27]  Krzysztof Gajowniczek,et al.  Weighted Random Forests to Improve Arrhythmia Classification , 2020, Electronics.

[28]  Juan José Rodríguez Diez,et al.  A weighted voting framework for classifiers ensembles , 2012, Knowledge and Information Systems.

[29]  Cornelis W. Oosterlee,et al.  Generalized beta regression models for random loss given default , 2008 .

[30]  Mariya A. Sodenkamp,et al.  Revealing Household Characteristics from Electricity Meter Data with Grade Analysis and Machine Learning Algorithms , 2018, Applied Sciences.

[31]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[32]  Takuji Kinkyo,et al.  Financial Hazard Map: Financial Vulnerability Predicted by a Random Forests Classification Model , 2018 .

[33]  Jonathan Crook,et al.  Support vector regression for loss given default modelling , 2015, Eur. J. Oper. Res..

[34]  S. Rostek,et al.  Modeling loss given default with stochastic collateral , 2015 .

[35]  Yuri Yashkir,et al.  Loss Given Default Modeling: a Comparative Analysis , 2013 .

[36]  Johannes Beutel,et al.  Does machine learning help us predict banking crises? , 2019 .

[38]  Ash Booth,et al.  Automated trading with performance weighted random forests and seasonality , 2014, Expert Syst. Appl..

[39]  Ruey-Ching Hwang,et al.  Predicting recovery rates using logistic quantile regression with bounded outcomes , 2016 .

[40]  Eesha Goel,et al.  Random Forest: A Review , 2017 .

[41]  Carolyn Moclair Loss given default modeling: an application to data from a Polish bank , 2015 .

[42]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[43]  D. Rindskopf,et al.  Measures of Dispersion, Skewness and Kurtosis , 2010 .

[44]  M. Y. Hassan,et al.  A BIMODAL EXPONENTIAL POWER DISTRIBUTION , 2010 .

[45]  Daniel Rösch,et al.  Downturn LGD modeling using quantile regression , 2017 .

[46]  E. Roszkowska Rank Ordering Criteria Weighting Methods – a Comparative Overview , 2013 .

[47]  Prakasa Rao Estimation of a unimodal density , 1969 .

[48]  Director The Working Paper Series is made possible by a generous , 1994 .

[49]  Guanjun Liu,et al.  Refined Weighted Random Forest and Its Application to Credit Card Fraud Detection , 2018, CSoNet.

[50]  Ali Osman Kusakci,et al.  A Literature Survey on Association Rule Mining Algorithms , 2016 .

[51]  Lev V. Utkin,et al.  A weighted random survival forest , 2019, Knowl. Based Syst..

[52]  A. Bhattacharya,et al.  InAs/InP quantum dots with bimodal size distribution : Two evolution pathways , 2007 .

[53]  F. Famoye,et al.  Beta-Normal Distribution: Bimodality Properties and Application , 2004 .

[54]  Alberto D. Pascual-Montano,et al.  A survey of dimensionality reduction techniques , 2014, ArXiv.

[55]  J. Crook,et al.  Loss given default models incorporating macroeconomic variables for credit cards , 2012 .

[56]  Krzysztof Gajowniczek,et al.  Electricity peak demand classification with artificial neural networks , 2017, 2017 Federated Conference on Computer Science and Information Systems (FedCSIS).

[57]  Peter Bühlmann,et al.  MissForest - non-parametric missing value imputation for mixed-type data , 2011, Bioinform..

[58]  U. Grzybowska,et al.  APPLICATION OF MIXED MODELS AND FAMILIES OF CLASSIFIERS TO ESTIMATION OF FINANCIAL RISK PARAMETERS , 2015 .

[59]  Hieu Pham,et al.  On Cesáro Averages for Weighted Trees in the Random Forest , 2019, Journal of Classification.

[60]  Haewon Byeon,et al.  Exploring Factors Associated with Voucher Program for Speech Language Therapy for the Preschoolers of Parents with Communication Disorder using Weighted Random Forests , 2019 .

[61]  Greg M. Gupton LOSSCALC V2: DYNAMIC PREDICTION OF LGD , 2005 .

[62]  Krzysztof Gajowniczek,et al.  Short term electricity forecasting based on user behavior from individual smart meter data , 2015, J. Intell. Fuzzy Syst..

[63]  Kristina Lerman,et al.  A Survey on Bias and Fairness in Machine Learning , 2019, ACM Comput. Surv..

[64]  Osvaldo Venegas,et al.  An Asymmetric Bimodal Distribution with Application to Quantile Regression , 2019, Symmetry.

[65]  João Pedro de Magalhães,et al.  A review of supervised machine learning applied to ageing research , 2017, Biogerontology.