Advances in Multivariate Back-testing for Credit Risk Underestimation

When back-testing the calibration quality of rating systems two-sided statistical tests can detect over- and underestimation of credit risk. Some users though, such as risk-averse investors and regulators, are primarily interested in the underestimation of risk only, and thus require one-sided tests. The established one-sided tests are multiple tests, which assess each rating class of the rating system separately and then combine the results to an overall assessment. However, these multiple tests may fail to detect underperformance of the whole rating system. Aiming to improve the overall assessment of rating systems, this paper presents a set of one-sided tests, which assess the performance of all rating classes jointly. These joint tests build on the method of Sterne [1954] for ranking possible outcomes by probability, which allows to extend back-testing to a setting of multiple rating classes. The new joint tests are compared to the most established one-sided multiple test and are further shown to outperform this benchmark in terms of power and size of the acceptance region. JEL Classification: C12, C52, G21, G24

[1]  Walter L. Smith Probability and Statistics , 1959, Nature.

[2]  Michael D. Perlman,et al.  One-Sided Testing Problems in Multivariate Analysis , 1969 .

[3]  E. S. Pearson,et al.  THE USE OF CONFIDENCE OR FIDUCIAL LIMITS ILLUSTRATED IN THE CASE OF THE BINOMIAL , 1934 .

[4]  Markus Leippold,et al.  Economic Benefit of Powerful Credit Scoring , 2005 .

[5]  T. E. Sterne,et al.  Some remarks on confidence or fiducial limits , 1954 .

[6]  Stephen E. Fienberg,et al.  Testing Statistical Hypotheses , 2005 .

[7]  J. Reiczigel,et al.  Confidence intervals for the binomial parameter: some new considerations , 2003, Statistics in medicine.

[8]  R. Jankowitsch,et al.  Modelling the Economic Value of Credit Rating Systems , 2004 .

[9]  Florian Resch,et al.  Pitfalls and remedies in testing the calibration quality of rating systems , 2011 .

[10]  A. Agresti,et al.  Approximate is Better than “Exact” for Interval Estimation of Binomial Proportions , 1998 .

[11]  Sebastian Döhler,et al.  Validation of credit default probabilities via multiple testing procedures , 2010 .

[12]  Carolyn Moclair Validation of credit default probabilities using multiple-testing procedures , 2010 .

[13]  D. J. Bartholomew,et al.  A TEST OF HOMOGENEITY FOR ORDERED ALTERNATIVES. II , 1959 .

[14]  David A. van Dyk,et al.  The Role of Statistics in the Discovery of a Higgs Boson , 2014 .

[15]  Manuel Lingo,et al.  Discriminatory Power - An Obsolete Validation Criterion? , 2008 .

[16]  Kurt Hornik,et al.  Validation of Credit Rating Systems Using Multi-Rater Information , 2006 .

[17]  L D Fisher,et al.  The use of one-sided tests in drug trials: an FDA advisory committee member's perspective. , 1991, Journal of biopharmaceutical statistics.

[18]  J. I The Design of Experiments , 1936, Nature.

[19]  Fernando González,et al.  The Performance of Credit Rating Systems in the Assessment of Collateral Used in Eurosystem Monetary Policy Operations , 2007, SSRN Electronic Journal.

[20]  S E Vollset,et al.  Confidence intervals for a binomial proportion. , 1994, Statistics in medicine.

[21]  P Bauer,et al.  Multiple testing in clinical trials. , 1991, Statistics in medicine.

[22]  P. O'Brien Procedures for comparing samples with multiple endpoints. , 1984, Biometrics.

[23]  P. Westfall,et al.  Multiple Tests with Discrete Distributions , 1997 .

[24]  Jenő Reiczigel,et al.  An exact confidence set for two binomial proportions and exact unconditional confidence intervals for the difference and ratio of proportions , 2008, Comput. Stat. Data Anal..

[25]  Alexander J. McNeil,et al.  Dependent defaults in models of portfolio credit risk , 2003 .

[26]  Martin Weber,et al.  Generally accepted rating principles: A primer , 2001 .