FUZZY MULTI-CRITERIA RANDOM SEED AND CUTOFF POINT APPROACH FOR CREDIT RISK ASSESSMENT

Data mining classification techniques have been studied extensively for credit risk assessment. Existing techniques by default uses 0.5 as the cutoff irrespective of datasets and classifiers to predict the binary outcomes, thus limiting their classification performance on imbalanced group sizes of datasets. This paper addresses two key problems with the existing techniques and talks about the advantages of using Multiple Criteria Decision Making (MCDM) technique on multiple evaluation criteria. The first key problem is applying default cutoff irrespective of datasets and classifiers. The second one is utilizing single criteria for evaluating classification performance and predicting cutoff point. This research work identifies the best cutoff point with respect to datasets and classifiers and integrates MCDM under fuzzy environment in all data mining stages of evaluation to take better decisions on multiple criteria, selection of initial random seed in the clustering phase for better cluster quality and Best Seed Clustering combined Classification (BSCC hybrid algorithm) with selected features to improve classification performance. The integration of these techniques gives a better hand to improve cluster quality and classification performance score with respect to datasets and classifiers because the cutoff point varies from dataset to dataset and classifiers to classifiers. Experimental outcomes from applied credit dataset of UCI machine learning repository found to be competitive and the proposed BSCC hybrid algorithm increases the performance score on obtained cutoff point over non-hybrid approach with default cutoff.

[1]  Sotiris B. Kotsiantis Credit risk analysis using a hybrid data mining model , 2007, Int. J. Intell. Syst. Technol. Appl..

[2]  Nan Ye,et al.  Optimizing F-measure: A Tale of Two Approaches , 2012, ICML.

[3]  M. Bohanec,et al.  The Analytic Hierarchy Process , 2004 .

[4]  R. Calabrese,et al.  Optimal cut-off for rare events and unbalanced misclassification costs , 2014 .

[5]  K Aparna,et al.  Selection of Initial Seed Values for K-Means Algorithm Using Taguchi Method as an Optimization Technique , 2014 .

[6]  Guo Yao-huang Personal Credit Scoring Models on Naive Bayesian Classifier , 2006 .

[7]  Ksenija Mandic,et al.  Analysis of the financial parameters of Serbian banks through the application of the fuzzy AHP and TOPSIS methods , 2014 .

[8]  Chong Sun Hong,et al.  Optimal Threshold from ROC and CAP Curves , 2009, Commun. Stat. Simul. Comput..

[9]  A. M. Kimiagari,et al.  Calculating the best cut off point using logistic regression and neural network on credit scoring problem- A case study of a commercial bank , 2013 .

[10]  Nitesh V. Chawla,et al.  Data Mining for Imbalanced Datasets: An Overview , 2005, The Data Mining and Knowledge Discovery Handbook.

[11]  M. N. Kumar,et al.  A New Methodology for Estimating Internal Credit Risk and Bankruptcy Prediction under Basel II Regime , 2015, 1502.00882.

[12]  Yi Peng,et al.  MCDM approach to evaluating bank loan default models , 2014 .

[13]  Yi Peng,et al.  Knowledge-Rich Data Mining in Financial Risk Detection , 2009, ICCS.

[14]  R Nedunchezhian,et al.  BOAT adaptive credit card fraud detection system , 2010, 2010 IEEE International Conference on Computational Intelligence and Computing Research.

[15]  Hussain Ali Bekhet,et al.  Credit risk assessment model for Jordanian commercial banks : neural scoring approach , 2014 .

[16]  Silke Wagner,et al.  Comparing Clusterings - An Overview , 2007 .

[17]  Mark J. Embrechts,et al.  On the Use of the Adjusted Rand Index as a Metric for Evaluating Supervised Classification , 2009, ICANN.

[18]  Weimin Chen,et al.  Credit risk Evaluation by hybrid data mining technique , 2012 .

[19]  Hongjun Lu,et al.  CBC: clustering based text classification requiring minimal labeled data , 2003, Third IEEE International Conference on Data Mining.

[20]  T. Miranda Lakshmi,et al.  A Survey on Multi Criteria Decision Making Methods and Its Applications , 2013 .

[21]  Ricardo J. G. B. Campello,et al.  Relative clustering validity criteria: A comparative overview , 2010, Stat. Anal. Data Min..

[22]  J. Jebamalar Tamilselvi,et al.  Simplified MCDM Analytical Weighted Model for Ranking Classifiers in Financial Risk Datasets , 2014, 2014 International Conference on Intelligent Computing Applications.

[23]  Wei Ding,et al.  De-Word Classification Algorithm Based on the Electric Power of Large Data Library Retrieval , 2015 .

[24]  M. Narasimha Murty,et al.  A near-optimal initial seed value selection in K-means means algorithm using a genetic algorithm , 1993, Pattern Recognit. Lett..

[25]  J. Tamilselvi,et al.  Assessment of Fraud Pretentious Business Region Research Articles Using Data Mining Approaches , 2013 .

[26]  T. Minka A comparison of numerical optimizers for logistic regression , 2004 .

[27]  José Salvador Sánchez,et al.  Index of Balanced Accuracy: A Performance Measure for Skewed Class Distributions , 2009, IbPRIA.

[28]  Yi Peng,et al.  Evaluation of clustering algorithms for financial risk analysis using MCDM methods , 2014, Inf. Sci..

[29]  Jun Li,et al.  Multicriteria Decision Making Approach for Cluster Validation , 2012, ICCS.

[30]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[31]  M. Cugmas,et al.  On comparing partitions , 2015 .

[32]  Gang Kou,et al.  An empirical study of classification algorithm evaluation for financial risk prediction , 2011, Appl. Soft Comput..

[33]  N. Lukashevich,et al.  The Evaluation of Credit Scoring Models Parameters Using Roc Curve Analysis , 2014 .

[34]  Mohammad Reza Gholamian,et al.  A New Method for Clustering in Credit Scoring Problems , 2013 .

[35]  Y. Beulah Jeba Jaya and J. Jebamalar Tamilselvi,et al.  Fuzzified MCDM Consistent Ranking Feature Selection with Hybrid Algorithm for Credit Risk Assessment , 2015 .

[36]  Mahmood Alborzi,et al.  The Use of Genetic Algorithm, Clustering and Feature Selection Techniques in Construction of Decision Tree Models for Credit Scoring , 2013 .

[37]  Lior Rokach,et al.  Clustering Methods , 2005, The Data Mining and Knowledge Discovery Handbook.

[38]  Thomas W. Lin,et al.  Application of the fuzzy analytic hierarchy process to the lead-free equipment selection decision , 2011 .

[39]  Christophe Mues,et al.  An experimental comparison of classification algorithms for imbalanced credit scoring data sets , 2012, Expert Syst. Appl..

[40]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[41]  Mohammad Izadikhah,et al.  Extension of the TOPSIS method for decision-making problems with fuzzy data , 2006, Appl. Math. Comput..

[42]  W. Pedrycz,et al.  A fuzzy extension of Saaty's priority theory , 1983 .

[43]  Ian H. Witten,et al.  Weka: Practical machine learning tools and techniques with Java implementations , 1999 .

[44]  Simon Scheider,et al.  Spatial data mining for retail sales forecasting , 2008 .

[45]  Ching-Lai Hwang,et al.  Multiple Attribute Decision Making: Methods and Applications - A State-of-the-Art Survey , 1981, Lecture Notes in Economics and Mathematical Systems.

[46]  J A Swets,et al.  Measuring the accuracy of diagnostic systems. , 1988, Science.

[47]  D. Chang Applications of the extent analysis method on fuzzy AHP , 1996 .

[48]  Sotiris Kotsiantis,et al.  On Implementing a Financial Decision Support System , 2006 .

[49]  Aida Krichène Abdelmoula Bank Credit Risk Analysis with K-Nearest-Neighbor Classifier: Case of Tunisian Banks , 2015 .

[50]  K. Hajian‐Tilaki,et al.  Receiver Operating Characteristic (ROC) Curve Analysis for Medical Diagnostic Test Evaluation. , 2013, Caspian journal of internal medicine.