论文信息 - Effects of missing data in credit risk scoring. A comparative analysis of methods to achieve robustness in the absence of sufficient data

Effects of missing data in credit risk scoring. A comparative analysis of methods to achieve robustness in the absence of sufficient data

Abstract The 2004 Basel II Accord has pointed out the benefits of credit risk management through internal models using internal data to estimate risk components: probability of default (PD), loss given default, exposure at default and maturity. Internal data are the primary data source for PD estimates; banks are permitted to use statistical default prediction models to estimate the borrowers’ PD, subject to some requirements concerning accuracy, completeness and appropriateness of data. However, in practice, internal records are usually incomplete or do not contain adequate history to estimate the PD. Current missing data are critical with regard to low default portfolios, characterised by inadequate default records, making it difficult to design statistically significant prediction models. Several methods might be used to deal with missing data such as list-wise deletion, application-specific list-wise deletion, substitution techniques or imputation models (simple and multiple variants). List-wise deletion is an easy-to-use method widely applied by social scientists, but it loses substantial data and reduces the diversity of information resulting in a bias in the model's parameters, results and inferences. The choice of the best method to solve the missing data problem largely depends on the nature of missing values (MCAR, MAR and MNAR processes) but there is a lack of empirical analysis about their effect on credit risk that limits the validity of resulting models. In this paper, we analyse the nature and effects of missing data in credit risk modelling (MCAR, MAR and NMAR processes) and take into account current scarce data set on consumer borrowers, which include different percents and distributions of missing data. The findings are used to analyse the performance of several methods for dealing with missing data such as likewise deletion, simple imputation methods, MLE models and advanced multiple imputation (MI) alternatives based on MarkovChain-MonteCarlo and re-sampling methods. Results are evaluated and discussed between models in terms of robustness, accuracy and complexity. In particular, MI models are found to provide very valuable solutions with regard to credit risk missing data.

Raquel Florez-Lopez | R. Flórez-López | Raquel Flórez-López

[1] M. Kenward,et al. A comparison of multiple imputation and doubly robust estimation for analyses with missing data , 2006 .

[2] Roderick J. A. Little,et al. Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[3] Til Schuermann,et al. Confidence Intervals for Probabilities of Default , 2005 .

[4] J. Ross Quinlan,et al. C4.5: Programs for Machine Learning , 1992 .

[5] Fred H. Cate,et al. The Impact of Opt-In Privacy Rules on Retail Credit Markets: A Case Study of Mbna , 2003 .

[6] Clifford C. Clogg,et al. Handbook of statistical modeling for the social and behavioral sciences , 1995 .

[7] S. Lipsitz,et al. Missing-Data Methods for Generalized Linear Models , 2005 .

[8] H. Boshuizen,et al. Multiple imputation of missing blood pressure covariates in survival analysis. , 1999, Statistics in medicine.

[9] R. Little. Missing-Data Adjustments in Large Surveys , 1988 .

[10] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[11] James D. Wright,et al. Handbook of Survey Research. , 1985 .