Best classification algorithms in peer-to-peer lending

Abstract A proper credit scoring technique is vital to the long-term success of all kinds of financial institutions, including peer-to-peer (P2P) lending platforms. The main contribution of our paper is the robust ranking of 10 different classification techniques based on a real-world P2P lending data set. Our data set comes from the Lending Club covering the 2009–2013 period, which contains 212,252 records and 23 different variables. Unlike other researchers, we use a data sample which contains the final loan resolution for all loans. We built our research using a 5-fold cross-validation method and 6 different classification performance measurements. Our results show that logistic regression, artificial neural networks, and linear discriminant analysis are the three best algorithms based on the Lending Club data. Conversely, we identify k-nearest neighbors and classification and regression tree as the two worst classification methods.

[1]  J. Suykens,et al.  Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research , 2015, Eur. J. Oper. Res..

[2]  Chun-Ling Chuang,et al.  Constructing a reassigning credit scoring model , 2009, Expert Syst. Appl..

[3]  Siva Viswanathan,et al.  Judging Borrowers by the Company They Keep: Friendship Networks and Information Asymmetry in Online Peer-to-Peer Lending , 2011, Manag. Sci..

[4]  Devin G. Pope,et al.  What’s in a Picture? , 2011, The Journal of Human Resources.

[5]  Mu-Chen Chen,et al.  Credit scoring with a data mining approach based on support vector machines , 2007, Expert Syst. Appl..

[6]  David J. Hand,et al.  Statistical Classification Methods in Consumer Credit Scoring: a Review , 1997 .

[7]  Soner Akkoç,et al.  An empirical comparison of conventional techniques, neural networks and the three stage hybrid Adaptive Neuro Fuzzy Inference System (ANFIS) model for credit scoring analysis: The case of Turkish credit card data , 2012, Eur. J. Oper. Res..

[8]  Ethan Namvar,et al.  An Introduction to Peer-to-Peer Loans as Investments , 2013 .

[9]  Steven Salzberg,et al.  On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach , 1997, Data Mining and Knowledge Discovery.

[10]  Johan A. K. Suykens,et al.  Benchmarking state-of-the-art classification algorithms for credit scoring , 2003, J. Oper. Res. Soc..

[11]  Paweł Cichosz,et al.  Data Mining Algorithms: Explained Using R , 2015 .

[12]  Carlos Serrano-Cinca,et al.  Determinants of Default in P2P Lending , 2015, PloS one.

[13]  Tobias Regner,et al.  Determinants of Borrowers' Default in P2P Lending under Consideration of the Loan Risk Class , 2016, Games.

[14]  Shunpo Chang,et al.  Predicting Default Risk of Lending Club Loans , 2016 .

[15]  Vural Aksakalli,et al.  Risk assessment in social lending via random forests , 2015, Expert Syst. Appl..

[16]  I-Cheng Yeh,et al.  The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients , 2009, Expert Syst. Appl..

[17]  Hussein A. Abdou,et al.  Credit Scoring, Statistical Techniques and Evaluation Criteria: A Review of the Literature , 2011, Intell. Syst. Account. Finance Manag..

[18]  Elizabeth Mays,et al.  Handbook of Credit Scoring , 2001 .

[19]  Max Kuhn,et al.  Applied Predictive Modeling , 2013 .

[20]  Rick L. Andrews,et al.  Strategic Herding Behavior in Peer-to-Peer Loan Auctions , 2010 .

[21]  Maysam F. Abbod,et al.  Classifiers consensus system approach for credit scoring , 2016, Knowl. Based Syst..

[22]  Shu-Ping Lin,et al.  The consumer loan default predicting model - An application of DEA-DA and neural network , 2009, Expert Syst. Appl..

[23]  Lance A. Young,et al.  Trust and Credit: The Role of Appearance in Peer-to-peer Lending , 2012 .

[24]  David J. Hand,et al.  Measuring classifier performance: a coherent alternative to the area under the ROC curve , 2009, Machine Learning.

[25]  Yi Jiang,et al.  A Comparison Study of Credit Scoring Models , 2007, Third International Conference on Natural Computation (ICNC 2007).

[26]  Peng Liu,et al.  Rational Herding in Microloan Markets , 2012, Manag. Sci..

[27]  Kaspar Rufibach,et al.  Use of Brier score to assess binary predictions. , 2010, Journal of clinical epidemiology.