Rule Extraction from Support Vector Machines: An Overview of Issues and Application in Credit Scoring

Innovative storage technology and the rising popularity of the Internet have generated an ever-growing amount of data. In this vast amount of data much valuable knowledge is available, yet it is hidden. The Support Vector Machine (SVM) is a state-of-the-art classification technique that generally provides accurate models, as it is able to capture non-linearities in the data. However, this strength is also its main weakness, as the generated non-linear models are typically regarded as incomprehensible black-box models. By extracting rules that mimic the black box as closely as possible, we can provide some insight into the logics of the SVM model. This explanation capability is of crucial importance in any domain where the model needs to be validated before being implemented, such as in credit scoring (loan default prediction) and medical diagnosis. If the SVM is regarded as the current state-of-the-art, SVM rule extraction can be the state-of-the-art of the (near) future. This chapter provides an overview of recently proposed SVM rule extraction techniques, complemented with the pedagogical Artificial Neural Network (ANN) rule extraction techniques which are also suitable for SVMs. Issues related to this topic are the different rule outputs and corresponding rule expressiveness; the focus on high dimensional data as SVM models typically perform well on such data; and the requirement that the extracted rules are in line with existing domain knowledge. These issues are explained and further illustrated with a credit scoring case, where we extract a Trepan tree and a RIPPER rule set from the generated SVM model. The benefit of decision tables in a rule extraction context is also demonstrated. Finally, some interesting alternatives for SVM rule extraction are listed.

[1]  H. Daniels,et al.  Derivation of Monotone Decision Models from Non-Monotone Data , 2003 .

[2]  Thomas G. Dietterich,et al.  Learning from Sparse Data by Exploiting Monotonicity Constraints , 2005, UAI.

[3]  T. Kohonen Self-organized formation of topographically correct feature maps , 1982 .

[4]  Johan A. K. Suykens,et al.  A process model to develop an internal rating system: Sovereign credit ratings , 2006, Decis. Support Syst..

[5]  Marina Velikova,et al.  Decision trees for monotone price models , 2004, Comput. Manag. Sci..

[6]  A. J. Feelders,et al.  Pruning for Monotone Classification Trees , 2003, IDA.

[7]  Arie Ben-David,et al.  Monotonicity maintenance in information-theoretic machine learning algorithms , 2004, Machine Learning.

[8]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevance Vector Machine , 2001 .

[9]  Chris Aldrich,et al.  ANN-DT: an algorithm for extraction of decision trees from artificial neural networks , 1999, IEEE Trans. Neural Networks.

[10]  Bart Baesens,et al.  Using Rule Extraction to Improve the Comprehensibility of Predictive Models , 2006 .

[11]  W. R. Shankle,et al.  Acceptance by medical experts of rules generated by machine learning , 2001 .

[12]  Mark Craven,et al.  Extracting comprehensible models from trained neural networks , 1996 .

[13]  Peter Clark,et al.  The CN2 Induction Algorithm , 1989, Machine Learning.

[14]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[15]  Bart Baesens,et al.  Using Neural Network Rule Extraction and Decision Tables for Credit - Risk Evaluation , 2003, Manag. Sci..

[16]  Fei Chen,et al.  LEARNING ACCURATE AND UNDERSTANDABLE RULES FROM SVM CLASSIFIERS , 2004 .

[17]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[18]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[19]  Bart Baesens,et al.  ITER: An Algorithm for Predictive Regression Rule Extraction , 2006, DaWaK.

[20]  Zhi-Hua Zhou,et al.  Extracting symbolic rules from trained neural network ensembles , 2003, AI Commun..

[21]  Lars Niklasson,et al.  The Truth is In There - Rule Extraction from Opaque Models Using Genetic Programming , 2004, FLAIRS.

[22]  Brian D. Ripley,et al.  Neural Networks and Related Methods for Classification , 1994 .

[23]  M J Pazzani,et al.  Acceptance of Rules Generated by Machine Learning among Medical Experts , 2001, Methods of Information in Medicine.

[24]  D. Cox,et al.  An Analysis of Transformations , 1964 .

[25]  R. Quinlan,et al.  Decision tree discovery , 1999 .

[26]  Mark Craven,et al.  Rule Extraction: Where Do We Go from Here? , 1999 .

[27]  Joseph Sill,et al.  Monotonic Networks , 1997, NIPS.

[28]  Urszula Markowska-Kaczmar,et al.  Discovering the Mysteries of Neural Networks , 2004, Int. J. Hybrid Intell. Syst..

[29]  Joachim Diederich,et al.  Eclectic Rule-Extraction from Support Vector Machines , 2005 .

[30]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[31]  Marina Velikova,et al.  Solving Partially Monotone Problems with Neural Networks , 2007 .

[32]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[33]  J. Suykens,et al.  Linear and Non-linear Credit Scoring by Combining Logistic Regression and Support Vector Machines , 2006 .

[34]  Glenn Fung,et al.  Rule extraction from linear support vector machines , 2005, KDD '05.

[35]  Daniel Rivero,et al.  A New Approach to the Extraction of ANN Rules and to Their Generalization Capacity Through GP , 2004, Neural Computation.

[36]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[37]  Johan A. K. Suykens,et al.  Benchmarking state-of-the-art classification algorithms for credit scoring , 2003, J. Oper. Res. Soc..

[38]  Jude W. Shavlik,et al.  in Advances in Neural Information Processing , 1996 .

[39]  Michael V. Mannino,et al.  The cost-minimizing inverse classification problem: a genetic algorithm approach , 2000, Decis. Support Syst..

[40]  Urszula Markowska-Kaczmar,et al.  Extraction of fuzzy rules from trained neural network using evolutionary algorithm , 2003, ESANN.

[41]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[42]  Teuvo Kohonen,et al.  In: Self-organising Maps , 1995 .

[43]  Bart Baesens,et al.  Country Corruption Analysis with Self Organizing Maps and Support Vector Machines , 2006, WISI.

[44]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[45]  Michael E. Tipping The Relevance Vector Machine , 1999, NIPS.

[46]  Teuvo Kohonen,et al.  Visual Explorations in Finance , 1998 .

[47]  Juha Vesanto,et al.  SOM-based data visualization methods , 1999, Intell. Data Anal..

[48]  Joachim Diederich,et al.  Survey and critique of techniques for extracting rules from trained artificial neural networks , 1995, Knowl. Based Syst..

[49]  Urszula Markowska-Kaczmar,et al.  Rule Extraction from Trained Neural Network with Evolutionary Algorithms , 2003 .

[50]  Joachim Diederich,et al.  Rule Extraction from Support Vector Machines , 2008, Studies in Computational Intelligence.

[51]  K. Johana,et al.  Benchmarking Least Squares Support Vector Machine Classifiers , 2022 .

[52]  P. Brockett,et al.  Using Kohonen's Self-Organizing Feature Map to Uncover Automobile Bodily Injury Claims Fraud , 1998 .

[53]  Joydeep Ghosh,et al.  Symbolic Interpretation of Artificial Neural Networks , 1999, IEEE Trans. Knowl. Data Eng..

[54]  Bart Baesens,et al.  Forecasting and analyzing insurance companies' ratings , 2007 .

[55]  Pragya Agarwal,et al.  Self-Organising Maps , 2008 .

[56]  Joachim Diederich,et al.  Learning-Based Rule-Extraction From Support Vector Machines: Performance On Benchmark Data Sets , 2004 .

[57]  Olcay Boz,et al.  Converting A Trained Neural Network To a Decision Tree DecText - Decision Tree Extractor , 2002, ICMLA.

[58]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[59]  Bart Baesens,et al.  Minerva: Sequential Covering for Rule Extraction , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[60]  Ryszard S. Michalski,et al.  On the Quasi-Minimal Solution of the General Covering Problem , 1969 .

[61]  Timo Honkela,et al.  WEBSOM - Self-organizing maps of document collections , 1998, Neurocomputing.

[62]  Bart Baesens,et al.  Ant-Based Approach to the Knowledge Fusion Problem , 2006, ANTS Workshop.

[63]  T. Kohonen,et al.  Visual Explorations in Finance with Self-Organizing Maps , 1998 .

[64]  Bart Baesens,et al.  Risk Management and Regulatory Compliance: A Data Mining Framework Based on Neural Network Rule Extraction , 2006, ICIS.

[65]  Bart Baesens,et al.  Comprehensible Credit Scoring Models Using Rule Extraction from Support Vector Machines , 2007, Eur. J. Oper. Res..

[66]  Monique Snoeck,et al.  Classification With Ant Colony Optimization , 2007, IEEE Transactions on Evolutionary Computation.

[67]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .