Peer-to-peer loan acceptance and default prediction with artificial intelligence

Logistic regression (LR) and support vector machine algorithms, together with linear and nonlinear deep neural networks (DNNs), are applied to lending data in order to replicate lender acceptance of loans and predict the likelihood of default of issued loans. A two-phase model is proposed; the first phase predicts loan rejection, while the second one predicts default risk for approved loans. LR was found to be the best performer for the first phase, with test set recall macro score of 77.4%. DNNs were applied to the second phase only, where they achieved best performance, with test set recall score of 72%, for defaults. This shows that artificial intelligence can improve current credit risk models reducing the default risk of issued loans by as much as 70%. The models were also applied to loans taken for small businesses alone. The first phase of the model performs significantly better when trained on the whole dataset. Instead, the second phase performs significantly better when trained on the small business subset. This suggests a potential discrepancy between how these loans are screened and how they should be analysed in terms of default prediction.

[1]  Kishan G. Mehrotra,et al.  Characterization of a Class of Sigmoid Functions with Applications to Neural Networks , 1996, Neural Networks.

[2]  Rodney X. Sturdivant,et al.  Applied Logistic Regression: Hosmer/Applied Logistic Regression , 2005 .

[3]  D. M. Green,et al.  Signal detection theory and psychophysics , 1966 .

[4]  Dmitry P. Vetrov,et al.  Variational Dropout Sparsifies Deep Neural Networks , 2017, ICML.

[5]  Riza Emekter,et al.  Evaluating credit risk and loan performance in online Peer-to-Peer (P2P) lending , 2015 .

[6]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[7]  M Tumminello,et al.  A tool for filtering information in complex systems. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Siva Viswanathan,et al.  Judging Borrowers by the Company They Keep: Friendship Networks and Information Asymmetry in Online Peer-to-Peer Lending , 2011, Manag. Sci..

[9]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[10]  Nilas Möllenkamp Determinants of Loan Performance in P2P Lending , 2017 .

[11]  Guido Previde Massara,et al.  Network Filtering for Big Data: Triangulated Maximally Filtered Graph , 2015, J. Complex Networks.

[12]  SchmidhuberJürgen Deep learning in neural networks , 2015 .

[13]  David M. W. Powers,et al.  Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation , 2011, ArXiv.

[14]  R. Mantegna Hierarchical structure in financial markets , 1998, cond-mat/9802256.

[15]  Giacomo Livan,et al.  A Pólya urn approach to information filtering in complex networks , 2018, Nature Communications.

[16]  C. Canfield Determinants of default in p2p lending: the Mexican case , 2018 .

[17]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18]  Marti A. Hearst Trends & Controversies: Support Vector Machines , 1998, IEEE Intell. Syst..

[19]  Priscilla S. Markwood,et al.  The Long Tail: Why the Future of Business is Selling Less of More , 2006 .

[20]  Tomaso Aste,et al.  P2P Loan Acceptance and Default Prediction with Artificial Intelligence , 2019, SSRN Electronic Journal.

[21]  J. Stiglitz,et al.  Credit Rationing in Markets with Imperfect Information , 1981 .

[22]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[23]  C. Serrano-Cinca,et al.  Microfinance, the long tail and mission drift , 2014 .

[24]  Wojciech Czarnecki,et al.  On Loss Functions for Deep Neural Networks in Classification , 2017, ArXiv.

[25]  Carlos Serrano-Cinca,et al.  Determinants of Default in P2P Lending , 2015, PloS one.

[26]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[27]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[28]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[29]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .