Evaluating and Tuning Predictive Data Mining Models Using Receiver Operating Characteristic Curves

In this study, we conduct an empirical analysis of the performance of five popular data mining methods--neural networks, logistic regression, linear discriminant analysis, decision trees, and nearest neighbor--on two binary classification problems from the credit evaluation domain. Whereas most studies comparing data mining methods have employed accuracy as a performance measure, we argue that, for problems such as credit evaluation, the focus should be on minimizing misclassification cost. We first generate receiver operating characteristic (ROC) curves for the classifiers and use the area under the curve (AUC) measure to compare aggregate performance of the five methods over the spectrum of decision thresholds. Next, using the ROC results, we propose a method for tuning the classifiers by identifying optimal decision thresholds. We compare the methods based on expected costs across a range of cost-probability ratios. In addition to expected cost and AUC, we evaluate the models on the basis of their generalizability to unseen data, their scalability to other problems in the domain, and their robustness against changes in class distributions. We found that the performance of logistic regression and neural network models was superior under most conditions. In contrast, decision tree and nearest neighbor models yielded higher costs, and were much less generalizable and robust than the other models. An important finding of this research is that the models can be effectively tuned post hoc to make them cost sensitive, even though they were built without incorporating misclassification costs.

[1]  Sholom M. Weiss,et al.  Knowledge-based data mining , 2003, KDD '03.

[2]  Melody Y. Kiang,et al.  Managerial Applications of Neural Networks: The Case of Bank Failure Predictions , 1992 .

[3]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[4]  Karim K. Hirji,et al.  Discovering data mining: from concept to implementation , 1999, SKDD.

[5]  Kathryn B. Laskey,et al.  Introduction to the special issue on the fusion of domain knowledge with data for decision support , 2003 .

[6]  J. R. Quinlan Discovering rules by induction from large collections of examples Intro-ductory readings in expert s , 1979 .

[7]  Ian Witten,et al.  Data Mining , 2000 .

[8]  Raymond McLeod,et al.  Expert, Linear Models, and Nonlinear Models of Expert Decision Making in Bankruptcy Prediction: A Lens Model Analysis , 1999, J. Manag. Inf. Syst..

[9]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[10]  Ron Kohavi,et al.  The Case against Accuracy Estimation for Comparing Induction Algorithms , 1998, ICML.

[11]  Namsik Chang,et al.  Dynamics of Modeling in Data Mining: Interpretive Approach to Bankruptcy Prediction , 1999, J. Manag. Inf. Syst..

[12]  Simon Parsons,et al.  Principles of Data Mining by David J. Hand, Heikki Mannila and Padhraic Smyth, MIT Press, 546 pp., £34.50, ISBN 0-262-08290-X , 2004, The Knowledge Engineering Review.

[13]  Constantin Zopounidis,et al.  Credit risk assessment using a multicriteria hierarchical discrimination approach: A comparative analysis , 2002, Eur. J. Oper. Res..

[14]  T. Kohers,et al.  Developing a decision rule to predict failure: The case of savings and loan associations , 1994 .

[15]  J. Hanley,et al.  A method of comparing the areas under receiver operating characteristic curves derived from the same cases. , 1983, Radiology.

[16]  Sholom M. Weiss,et al.  Predictive data mining - a practical guide , 1997 .

[17]  Nikolaos M. Avouris,et al.  The Role of Domain Knowledge in a Large Scale Data Mining Project , 2002, SETN.

[18]  Paul Gray,et al.  Special Section: Data Mining , 1999, J. Manag. Inf. Syst..

[19]  Atish P. Sinha,et al.  A Case-Based Reasoning System for Indirect Bank Lending , 1996, Intell. Syst. Account. Finance Manag..

[20]  Kurt Fanning,et al.  Neural Network Detection of Management Fraud Using Published Financial Data , 1998 .

[21]  Donald Michie,et al.  Expert systems in the micro-electronic age , 1979 .

[22]  Casimir A. Kulikowski,et al.  Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning and Expert Systems , 1990 .

[23]  Foster Provost,et al.  The effect of class distribution on classifier learning: an empirical study , 2001 .

[24]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[25]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[26]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Data Mining Researchers , 2003 .

[27]  Kenneth O. Cogger,et al.  Neural network detection of management fraud using published financial data , 1998, Intell. Syst. Account. Finance Manag..

[28]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[29]  Koen Vanhoof,et al.  Credit classification: A comparison of logit models and decision trees , 1998 .

[30]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[31]  M. Weinstein,et al.  Clinical Decision Analysis , 1980 .

[32]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[33]  G. Peter Zhang,et al.  The Effect of Misclassification Costs on Neural Network Classifiers , 1999 .

[34]  Bharat A. Jain,et al.  Performance Evaluation of Neural Network Decision Models , 1997, J. Manag. Inf. Syst..

[35]  Xiaoning Zhang,et al.  Data Mining for Network Intrusion Detection: A Comparison of Alternative Methods , 2001, Decis. Sci..

[36]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.