Predicting Default Risk on Peer-to-Peer Lending Imbalanced Datasets

In the past few years, Peer-to-Peer lending (P2P lending) has grown rapidly in the world. The main idea of P2P lending is disintermediation and removing the intermediaries like banks. For a small business and some individuals without enough credit or credit history, P2P lending is a good way to apply for a loan. However, the fundamental problem of P2P lending is information asymmetry in this model, which may not correctly estimate the default risk of lending. Lenders only determine whether or not to fund the loan by the information provided by borrowers, causing P2P lending data to be imbalanced datasets which contain unequal fully paid and default loans. Imbalanced datasets are quite common in the real worlds, such as credit card fraud in transactions, bad products in the plant and so on. Unfortunately, the imbalanced data are unfriendly to the normal machine learning schemes. In our scenario, models without any adaptive methods would focus on learning the normal repayment. However, the characteristic of the minority class is critical in the loaning business. In this study, we utilize not only several machine learning schemes for predicting the default risk of P2P lending but also re-sampling and cost-sensitive mechanisms to process imbalanced datasets. Furthermore, we use the datasets from Lending Club to validate our proposed scheme. The experiment results show that our proposed scheme can effectively raise the prediction accuracy for default risk.

[1]  Ramesh Sharda,et al.  A neural network model for bankruptcy prediction , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[2]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[3]  József Mezei,et al.  Predicting Credit Risk in Peer-to-Peer Lending: A Neural Network Approach , 2015, 2015 IEEE Symposium Series on Computational Intelligence.

[4]  Yu Jin,et al.  A Data-Driven Approach to Predict Default Risk of Loan for Online Peer-to-Peer (P2P) Lending , 2015, 2015 Fifth International Conference on Communication Systems and Network Technologies.

[5]  Debahuti Mishra,et al.  Handling Imbalanced Data: A Survey , 2018 .

[6]  B. Funk,et al.  Online Peer-to-Peer Lending - A Literature Review , 2011 .

[7]  Hui Han,et al.  Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning , 2005, ICIC.

[8]  C. Krishnaveni,et al.  On the Classification of Imbalanced Datasets , 2022 .

[9]  Yufei Xia,et al.  Cost-sensitive boosted tree for loan evaluation in peer-to-peer lending , 2017, Electron. Commer. Res. Appl..

[10]  Taeho Jo,et al.  A Multiple Resampling Method for Learning from Imbalanced Data Sets , 2004, Comput. Intell..

[11]  Carlos Serrano-Cinca,et al.  Determinants of Default in P2P Lending , 2015, PloS one.

[12]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[13]  Amir F. Atiya,et al.  Bankruptcy prediction for credit risk using neural networks: A survey and new results , 2001, IEEE Trans. Neural Networks.

[14]  A. Omarini Peer-to-peer lending: business model analysis and the platform dilemma , 2018 .

[15]  Riza Emekter,et al.  Evaluating credit risk and loan performance in online Peer-to-Peer (P2P) lending , 2015 .

[16]  Carlos Serrano-Cinca,et al.  The use of profit scoring as an alternative to credit scoring systems in peer-to-peer (P2P) lending , 2016, Decis. Support Syst..