Rule extraction from linear support vector machines

We describe an algorithm for converting linear support vector machines and any other arbitrary hyperplane-based linear classifiers into a set of non-overlapping rules that, unlike the original classifier, can be easily interpreted by humans. Each iteration of the rule extraction algorithm is formulated as a constrained optimization problem that is computationally inexpensive to solve. We discuss various properties of the algorithm and provide proof of convergence for two different optimization criteria We demonstrate the performance and the speed of the algorithm on linear classifiers learned from real-world datasets, including a medical dataset on detection of lung cancer from medical images. The ability to convert SVM's and other "black-box" classifiers into a set of human-understandable rules, is critical not only for physician acceptance, but also to reducing the regulatory barrier for medical-decision support systems based on such classifiers.

[1]  Glenn Fung,et al.  Proximal support vector machine classifiers , 2001, KDD '01.

[2]  Vladimir Cherkassky,et al.  Learning from Data: Concepts, Theory, and Methods , 1998 .

[3]  Kendall Preston,et al.  Computer Processing of Biomedical Images , 1976, Computer.

[4]  Franz J. Kurfess Neural Networks and Structured Knowledge: Rule Extraction and Applications , 2004, Applied Intelligence.

[5]  William Nick Street,et al.  Breast Cancer Diagnosis and Prognosis Via Linear Programming , 1995, Oper. Res..

[6]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[7]  Paul S. Bradley,et al.  Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[8]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[9]  Olvi L. Mangasarian,et al.  Generalized Support Vector Machines , 1998 .

[10]  Edward H. Shortliffe,et al.  Rule Based Expert Systems: The Mycin Experiments of the Stanford Heuristic Programming Project (The Addison-Wesley series in artificial intelligence) , 1984 .

[11]  Glenn Fung,et al.  Knowledge-Based Support Vector Machine Classifiers , 2002, NIPS.

[12]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[13]  R S LEDLEY,et al.  Reasoning foundations of medical diagnosis; symbolic logic, probability, and value theory aid our understanding of how physicians reason. , 1959, Science.

[14]  R. Ledley,et al.  Reasoning foundations of medical diagnosis. , 1991, M.D. computing : computers in medical practice.

[15]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[16]  Joachim Diederich,et al.  Survey and critique of techniques for extracting rules from trained artificial neural networks , 1995, Knowl. Based Syst..

[17]  D. Bertsekas Projected Newton methods for optimization problems with simple constraints , 1981, 1981 20th IEEE Conference on Decision and Control including the Symposium on Adaptive Processes.

[18]  Jude Shavlik,et al.  THE EXTRACTION OF REFINED RULES FROM KNOWLEDGE BASED NEURAL NETWORKS , 1993 .

[19]  Glenn Fung,et al.  The disputed federalist papers: SVM feature selection via concave minimization , 2003, TAPIA '03.

[20]  Jude W. Shavlik,et al.  Extracting Refined Rules from Knowledge-Based Neural Networks , 1993, Machine Learning.

[21]  J Roehrig,et al.  The promise of computer aided detection in digital mammography. , 1999, European journal of radiology.