Rule Extraction from Neural Networks and Support Vector Machines for Credit Scoring

In this chapter we describe how comprehensible rules can be extracted from artificial neural networks (ANN) and support vector machines (SVM). ANN and SVM are two very popular techniques for pattern classification. In the business intelligence application domain of credit scoring, they have been shown to be effective tools for distinguishing between good credit risks and bad credit risks. The accuracy obtained by these two techniques is often higher than that from decision tree methods. Unlike decision tree methods, however, the classifications made by ANN and SVM are difficult to understand by the end-users as outputs from ANN and SVM are computed as nonlinear mapping of the input data attributes. We describe two rule extraction methods that we have developed to overcome this difficulty. These rule extraction methods enable the users to obtain comprehensible propositional rules from ANN and SVM. Such rules can be easily verified by the domain experts and would lead to a better understanding about the data in hand.

[1]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[2]  Johan A. K. Suykens,et al.  Benchmarking state-of-the-art classification algorithms for credit scoring , 2003, J. Oper. Res. Soc..

[3]  Joachim Diederich,et al.  Eclectic Rule-Extraction from Support Vector Machines , 2005 .

[4]  Rudy Setiono,et al.  A Penalty-Function Approach for Pruning Feedforward Neural Networks , 1997, Neural Computation.

[5]  Bart Baesens,et al.  Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[6]  Rudy Setiono A Neural Network Construction Algorithm which Maximizes the Likelihood Function , 1995, Connect. Sci..

[7]  Tom Fawcett PRIE: a system for generating rulelists to maximize ROC performance , 2008, Data Mining and Knowledge Discovery.

[8]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[9]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[10]  Rudy Setiono,et al.  Use of a quasi-Newton method in a feedforward neural network construction algorithm , 1995, IEEE Trans. Neural Networks.

[11]  Michael E. Tipping Sparse Bayesian Learning and the Relevance Vector Machine , 2001, J. Mach. Learn. Res..

[12]  Bart Baesens,et al.  Decompositional Rule Extraction from Support Vector Machines by Active Learning , 2009, IEEE Transactions on Knowledge and Data Engineering.

[13]  Andrew P. Bradley,et al.  Rule Extraction from Support Vector Machines: A Sequential Covering Approach , 2007, IEEE Transactions on Knowledge and Data Engineering.

[14]  Bart Baesens,et al.  Recursive Neural Network Rule Extraction for Data With Mixed Attributes , 2008, IEEE Transactions on Neural Networks.

[15]  R. Palmer,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[16]  Andreu Català,et al.  Rule extraction from support vector machines , 2002, ESANN.

[17]  Lutz Prechelt,et al.  PROBEN 1 - a set of benchmarks and benchmarking rules for neural network training algorithms , 1994 .

[18]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[19]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[20]  Andrew P. Bradley,et al.  Rule Extraction from Support Vector Machines: Measuring the Explanation Capability Using the Area under the ROC Curve , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[21]  John E. Dennis,et al.  Numerical methods for unconstrained optimization and nonlinear equations , 1983, Prentice Hall series in computational mathematics.

[22]  Foster J. Provost,et al.  Decision-Centric Active Learning of Binary-Outcome Models , 2007, Inf. Syst. Res..

[23]  Roberto Battiti,et al.  First- and Second-Order Methods for Learning: Between Steepest Descent and Newton's Method , 1992, Neural Computation.

[24]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machines , 2002 .

[25]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[26]  Joachim Diederich,et al.  The truth will come to light: directions and challenges in extracting the knowledge embedded within trained artificial neural networks , 1998, IEEE Trans. Neural Networks.

[27]  Randall S. Sexton,et al.  Knowledge discovery using a neural network simultaneous optimization algorithm on a real world classification problem , 2006, Eur. J. Oper. Res..

[28]  Miguel Figueroa,et al.  Competitive learning with floating-gate circuits , 2002, IEEE Trans. Neural Networks.

[29]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[30]  Jonathan N. Crook,et al.  Credit Scoring and Its Applications , 2002, SIAM monographs on mathematical modeling and computation.

[31]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[32]  Bart Baesens,et al.  Using Neural Network Rule Extraction and Decision Tables for Credit - Risk Evaluation , 2003, Manag. Sci..

[33]  Johan A. K. Suykens,et al.  Support Vector Machines : Least Squares Approaches and Extensions , 2003 .

[34]  Bart Baesens,et al.  ITER: An Algorithm for Predictive Regression Rule Extraction , 2006, DaWaK.

[35]  Joachim Diederich,et al.  Survey and critique of techniques for extracting rules from trained artificial neural networks , 1995, Knowl. Based Syst..

[36]  Tom Downs,et al.  Exact Simplification of Support Vector Solutions , 2002, J. Mach. Learn. Res..

[37]  Glenn Fung,et al.  Rule extraction from linear support vector machines , 2005, KDD '05.