Logistic sub-models for small size populations in credit scoring

The credit scoring risk management is a fast growing field due to consumer's credit requests. Credit requests, of new and existing customers, are often evaluated by classical discrimination rules based on customers information. However, these kinds of strategies have serious limits and don't take into account the characteristics difference between current customers and the future ones. The aim of this paper is to measure credit worthiness for non customers borrowers and to model potential risk given a heterogeneous population formed by borrowers customers of the bank and others who are not. We hold on previous works done in generalized discrimination and transpose them into the logistic model to bring out efficient discrimination rules for non customers' subpopulation. Therefore we obtain seven simple models of connection between parameters of both logistic models associated respectively to the two subpopulations. The German credit data set is selected as the experimental data to compare the seven models. Experimental results show that the use of links between the two subpopulations improve the classification accuracy for the new loan applicants.

[1]  J. Jacques,et al.  Modèles adaptatifs pour les mélanges de régressions , 2009 .

[2]  C. Biernacki,et al.  A Generalized Discriminant Rule When Training Population and Test Population Differ on Their Descriptive Parameters , 2002, Biometrics.

[3]  Y. Liu,et al.  The evaluation of classification models for credit scoring.. , 2003 .

[4]  Jonathan N. Crook,et al.  Credit Scoring and Its Applications , 2002, SIAM monographs on mathematical modeling and computation.

[5]  D. Durand Risk elements in consumer instalment financing , 1942 .

[6]  J. A. Anderson,et al.  7 Logistic discrimination , 1982, Classification, Pattern Recognition and Reduction of Dimensionality.

[7]  David R. Cox The analysis of binary data , 1970 .

[8]  W S McCulloch,et al.  A logical calculus of the ideas immanent in nervous activity , 1990, The Philosophy of Artificial Intelligence.

[9]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[10]  David J. Hand,et al.  Measuring Diagnostic Accuracy of Statistical Prediction Rules , 2001 .

[11]  Julien Jacques,et al.  Analyse discriminante sur données binaires lorsque les populations d'apprentissage et de test sont différentes , 2005, DMAS.

[12]  D. Hand,et al.  A k-nearest-neighbour classifier for assessing consumer credit risk , 1996 .

[13]  D. Durand Risk Elements in Consumer Instalment Financing, Technical Edition , 1941 .

[14]  L. Fahrmeir,et al.  Multivariate statistical modelling based on generalized linear models , 1994 .

[15]  Gilbert Saporta Credit scoring, statistique et apprentissage , 2006, EGC.

[16]  Ron J. Feldman Small business loans, small banks and big change in technology called credit scoring , 1997 .

[17]  Xitao Fan,et al.  Comparing Linear Discriminant Function with Logistic Regression for the Two-Group Classification Problem. , 1999 .

[18]  Paolo Giudici,et al.  Applied Data Mining: Statistical Methods for Business and Industry , 2003 .

[19]  P. Mahalanobis On the generalized distance in statistics , 1936 .

[20]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[21]  Christophe Biernacki,et al.  Updating a Logistic Discrimination Rule - Comparing Some Logistic Submodels in Credit-scoring , 2009, ICAART.