Application of Instance-Based Entropy Fuzzy Support Vector Machine in Peer-To-Peer Lending Investment Decision

Loan status prediction is an effective tool for investment decisions in peer-to-peer (P2P) lending market. In P2P lending market, most borrowers fulfill the repayment plan; however, some of them fail to pay back their loans. Therefore, an imbalanced classification method can be utilized to discriminate such default borrowers. In this context, the aim of this paper is to propose an investment decision model in P2P lending market which consists of fully paid loans classified via the instance-based entropy fuzzy support vector machine (IEFSVM). IEFSVM is a modified version of the existing entropy fuzzy support vector machine (EFSVM) in terms of an instance-based scheme. IEFSVM can reflect the pattern of nearest neighbors entropy with respect to the change of its size instead of fixing it in unified neighborhood size. Therefore, IEFSVM allows the class change of nearest neighbors in the determination of fuzzy membership. Applying the model to the lending club dataset, we determine loans that are predicted to be fully paid. Then, we also provide a multiple regression model to generate an investment portfolio based on non-default loans that are predicted to yield high returns. Throughout the experiment, the empirical results reveal that IEFSVM outperforms not only EFSVM but also the six other state-of-the-art classifiers including the cost-sensitive adaptive boosting, cost-sensitive random forest, EasyEnsemble, random undersampling boosting, weighted extreme learning machine, and cost-sensitive extreme gradient boosting in terms of loan status classification. Also, the investment performance of the multiple regression model using IEFSVM is higher and more robust than that of two other benchmarks. In this regard, we conclude that the proposed investment model is a decent and practical approach to support decisions in the P2P lending market.

[1]  Jian Yang,et al.  A weighted one-class support vector machine , 2016, Neurocomputing.

[2]  Furio Camillo,et al.  Personal values and credit scoring: new insights in the financial prediction , 2018, J. Oper. Res. Soc..

[3]  Yongzhao Zhan,et al.  Improved pseudo nearest neighbor classification , 2014, Knowl. Based Syst..

[4]  Yidi Wang,et al.  A new k-harmonic nearest neighbor classifier based on the multi-local means , 2017, Expert Syst. Appl..

[5]  Mehmet Fatih Amasyali,et al.  Locally adaptive k parameter selection for nearest neighbor classifier: one nearest cluster , 2017, Pattern Analysis and Applications.

[6]  Zhi Chen,et al.  A synthetic neighborhood generation based ensemble learning for the imbalanced data classification , 2017, Applied Intelligence.

[7]  Ömer Faruk Ertugrul,et al.  A novel version of k nearest neighbor: Dependent nearest neighbor , 2017, Appl. Soft Comput..

[8]  Christophe Croux,et al.  Bagging and Boosting Classification Trees to Predict Churn , 2006 .

[9]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[10]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Researchers , 2007 .

[11]  Xun Wang,et al.  A decision support model for investment on P2P lending platform , 2017, PloS one.

[12]  Jyrki Wallenius,et al.  Borrower Decision Aid for people-to-people lending , 2010, Decis. Support Syst..

[13]  Zahir Tari,et al.  KRNN: k Rare-class Nearest Neighbour classification , 2017, Pattern Recognit..

[14]  Shanlin Yang,et al.  Heterogeneous Ensemble for Default Prediction of Peer-to-Peer Lending in China , 2018, IEEE Access.

[15]  J. Suykens,et al.  Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research , 2015, Eur. J. Oper. Res..

[16]  Xuhui Chen,et al.  An entropy-based uncertainty measurement approach in neighborhood systems , 2014, Inf. Sci..

[17]  Ning Ye,et al.  Boundary detection and sample reduction for one-class Support Vector Machines , 2014, Neurocomputing.

[18]  José Salvador Sánchez,et al.  A literature review on the application of evolutionary computing to credit scoring , 2013, J. Oper. Res. Soc..

[19]  Yongtao Hao,et al.  A feature weighted support vector machine and K-nearest neighbor algorithm for stock market indices prediction , 2017, Expert Syst. Appl..

[20]  Hongyuan Zha,et al.  Entropy-based fuzzy support vector machine for imbalanced datasets , 2017, Knowl. Based Syst..

[21]  Johan A. K. Suykens,et al.  Benchmarking state-of-the-art classification algorithms for credit scoring , 2003, J. Oper. Res. Soc..

[22]  Kenneth Kennedy,et al.  Using semi-supervised classifiers for credit scoring , 2013, J. Oper. Res. Soc..

[23]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[24]  Yufei Xia,et al.  A boosted decision tree approach using Bayesian hyper-parameter optimization for credit scoring , 2017, Expert Syst. Appl..

[25]  Bart Baesens,et al.  Benchmarking sampling techniques for imbalance learning in churn prediction , 2018, J. Oper. Res. Soc..

[26]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[27]  Hui Xiong,et al.  Instance-based credit risk assessment for investment decisions in P2P lending , 2016, Eur. J. Oper. Res..

[28]  Zhi Chen,et al.  Creating diversity in ensembles using synthetic neighborhoods of training samples , 2017, Applied Intelligence.

[29]  R. T. Stewart,et al.  A profit-based scoring system in consumer credit: making acquisition decisions for credit cards , 2011, J. Oper. Res. Soc..

[30]  Shin Ando Classifying imbalanced data in distance-based feature space , 2015, Knowledge and Information Systems.

[31]  Carlos Serrano-Cinca,et al.  The use of profit scoring as an alternative to credit scoring systems in peer-to-peer (P2P) lending , 2016, Decis. Support Syst..

[32]  José Salvador Sánchez,et al.  On the suitability of resampling techniques for the class imbalance problem in credit scoring , 2013, J. Oper. Res. Soc..

[33]  Yiqiang Chen,et al.  Weighted extreme learning machine for imbalance learning , 2013, Neurocomputing.

[34]  Zhe Wang,et al.  Gravitational fixed radius nearest neighbor for imbalanced problem , 2015, Knowl. Based Syst..

[35]  Zhao Wang,et al.  Loan default prediction by combining soft information extracted from descriptive text in online peer-to-peer lending , 2018, Ann. Oper. Res..

[36]  Yufei Xia,et al.  Cost-sensitive boosted tree for loan evaluation in peer-to-peer lending , 2017, Electron. Commer. Res. Appl..

[37]  Carlos Serrano-Cinca,et al.  Determinants of Default in P2P Lending , 2015, PloS one.

[38]  Taghi M. Khoshgoftaar,et al.  RUSBoost: A Hybrid Approach to Alleviating Class Imbalance , 2010, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[39]  Bart Baesens,et al.  Development and application of consumer credit scoring models using profit-based classification measures , 2014, Eur. J. Oper. Res..

[40]  Vadim V. Strijov,et al.  Object selection in credit scoring using covariance matrix of parameters estimations , 2018, Ann. Oper. Res..

[41]  Jian Yang,et al.  Neighbors' distribution property and sample reduction for support vector machines , 2014, Appl. Soft Comput..

[42]  Hui Li,et al.  Imbalance-oriented SVM methods for financial distress prediction: a comparative study among the new SB-SVM-ensemble method and traditional methods , 2014, J. Oper. Res. Soc..

[43]  Xiaoli Yang,et al.  A rejection inference technique based on contrastive pessimistic likelihood estimation for P2P lending , 2018, Electron. Commer. Res. Appl..

[44]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[45]  Ambuj Mahanti,et al.  A knowledge based scheme for risk assessment in loan processing by banks , 2016, Decis. Support Syst..

[46]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[47]  Sebastián Maldonado,et al.  Integrated framework for profit-based feature selection and SVM classification in credit scoring , 2017, Decis. Support Syst..

[48]  Woojin Chang,et al.  Instance-based entropy fuzzy support vector machine for imbalanced data , 2019, Pattern Analysis and Applications.

[49]  W. Sharpe The Sharpe Ratio , 1994 .

[50]  Ray Tsaih,et al.  Credit scoring system for small business loans , 2004, Decis. Support Syst..

[51]  Zhi-Hua Zhou,et al.  Exploratory Undersampling for Class-Imbalance Learning , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[52]  Jian Yang,et al.  Extended nearest neighbor chain induced instance-weights for SVMs , 2016, Pattern Recognit..

[53]  Jose Manuel Cabello,et al.  A new classifier based on the reference point method with application in bankruptcy prediction , 2018, J. Oper. Res. Soc..

[54]  Zeineb Affes,et al.  Forecast bankruptcy using a blend of clustering and MARS model: case of US banks , 2019, Ann. Oper. Res..

[55]  Thomas Åstebro,et al.  Bound and collapse Bayesian reject inference for credit scoring , 2010, J. Oper. Res. Soc..

[56]  Sheng-De Wang,et al.  Fuzzy support vector machines , 2002, IEEE Trans. Neural Networks.

[57]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .