Active Learning-Based Pedagogical Rule Extraction

Many of the state-of-the-art data mining techniques introduce nonlinearities in their models to cope with complex data relationships effectively. Although such techniques are consistently included among the top classification techniques in terms of predictive power, their lack of transparency renders them useless in any domain where comprehensibility is of importance. Rule-extraction algorithms remedy this by distilling comprehensible rule sets from complex models that explain how the classifications are made. This paper considers a new rule extraction technique, based on active learning. The technique generates artificial data points around training data with low confidence in the output score, after which these are labeled by the black-box model. The main novelty of the proposed method is that it uses a pedagogical approach without making any architectural assumptions of the underlying model. It can therefore be applied to any black-box technique. Furthermore, it can generate any rule format, depending on the chosen underlying rule induction technique. In a large-scale empirical study, we demonstrate the validity of our technique to extract trees and rules from artificial neural networks, support vector machines, and random forests, on 25 data sets of varying size and dimensionality. Our results show that not only do the generated rules explain the black-box models well (thereby facilitating the acceptance of such models), the proposed algorithm also performs significantly better than traditional rule induction techniques in terms of accuracy as well as fidelity.

[1]  Bart Baesens,et al.  Decompositional Rule Extraction from Support Vector Machines by Active Learning , 2009, IEEE Transactions on Knowledge and Data Engineering.

[2]  Bart Baesens,et al.  Comprehensible Credit Scoring Models Using Rule Extraction from Support Vector Machines , 2007, Eur. J. Oper. Res..

[3]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[4]  Rich Caruana,et al.  An empirical evaluation of supervised learning in high dimensions , 2008, ICML '08.

[5]  Jianying Hu,et al.  Winning the KDD Cup Orange Challenge with Ensemble Selection , 2009, KDD Cup.

[6]  Monique Snoeck,et al.  Classification With Ant Colony Optimization , 2007, IEEE Transactions on Evolutionary Computation.

[7]  Bart Baesens,et al.  ITER: An Algorithm for Predictive Regression Rule Extraction , 2006, DaWaK.

[8]  Gerardine DeSanctis,et al.  Providing Decisional Guidance for Multicriteria Decision Making in Groups , 2000, Inf. Syst. Res..

[9]  Joachim Diederich,et al.  Survey and critique of techniques for extracting rules from trained artificial neural networks , 1995, Knowl. Based Syst..

[10]  F. E. Grubbs Procedures for Detecting Outlying Observations in Samples , 1969 .

[11]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001, Statistical Science.

[12]  Jacek M. Zurada,et al.  Extracting Rules From Neural Networks as Decision Diagrams , 2011, IEEE Transactions on Neural Networks.

[13]  Madhuri Jha ANN-DT : An Algorithm for Extraction of Decision Trees from Artificial Neural Networks , 2013 .

[14]  W. J. Studden,et al.  Theory Of Optimal Experiments , 1972 .

[15]  Foster J. Provost,et al.  Explaining Data-Driven Document Classifications , 2013, MIS Q..

[16]  Vladimir Naumovich Vapni The Nature of Statistical Learning Theory , 1995 .

[17]  Stefan Rüping,et al.  A Simple Method For Estimating Conditional Probabilities For SVMs , 2004, LWA.

[18]  Brian D. Ripley,et al.  Neural Networks and Related Methods for Classification , 1994 .

[19]  Urszula Markowska-Kaczmar,et al.  Discovering the Mysteries of Neural Networks , 2004, Int. J. Hybrid Intell. Syst..

[20]  Urszula Markowska-Kaczmar,et al.  Extraction of fuzzy rules from trained neural network using evolutionary algorithm , 2003, ESANN.

[21]  Jacek M. Zurada,et al.  Guest Editorial White Box Nonlinear Prediction Models , 2011, IEEE Transactions on Neural Networks.

[22]  Bart Baesens,et al.  An empirical evaluation of the comprehensibility of decision table, tree and rule based predictive models , 2011, Decis. Support Syst..

[23]  Marko Robnik-Sikonja,et al.  Explaining Classifications For Individual Instances , 2008, IEEE Transactions on Knowledge and Data Engineering.

[24]  Gerrit van Bruggen,et al.  DSS Effectiveness in Marketing Resource Allocation Decisions: Reality vs. Perception , 2004, Inf. Syst. Res..

[25]  Chao-Ton Su,et al.  Rule extraction algorithm from support vector machines and its application to credit screening , 2012, Soft Comput..

[26]  M J Pazzani,et al.  Acceptance of Rules Generated by Machine Learning among Medical Experts , 2001, Methods of Information in Medicine.

[27]  Andrew P. Bradley,et al.  Rule Extraction from Support Vector Machines: A Sequential Covering Approach , 2007, IEEE Transactions on Knowledge and Data Engineering.

[28]  Erik Strumbelj,et al.  An Efficient Explanation of Individual Classifications using Game Theory , 2010, J. Mach. Learn. Res..

[29]  Joachim Diederich,et al.  Eclectic Rule-Extraction from Support Vector Machines , 2005 .

[30]  Bart Baesens,et al.  Using Neural Network Rule Extraction and Decision Tables for Credit - Risk Evaluation , 2003, Manag. Sci..

[31]  Jude W. Shavlik,et al.  Using Sampling and Queries to Extract Rules from Trained Neural Networks , 1994, ICML.

[32]  Zhi-Hua Zhou,et al.  Extracting symbolic rules from trained neural network ensembles , 2003, AI Commun..

[33]  Iris Vessey,et al.  Multiattribute Data Presentation and Human Judgment: A Cognitive Fit Perspective* , 1994 .

[34]  Jacek M. Zurada,et al.  Learning Understandable Neural Networks With Nonnegative Weight Constraints , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[35]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[36]  Bart Baesens,et al.  Credit Risk Management , 2008 .

[37]  D. Shanno Conditioning of Quasi-Newton Methods for Function Minimization , 1970 .

[38]  Vicky Arnold,et al.  The Differential Use and Effect of Knowledge-Based System Explanations in Novice and Expert Judgement Decisions , 2006, MIS Q..

[39]  Galit Shmueli,et al.  Predictive Analytics in Information Systems Research , 2010, MIS Q..

[40]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[41]  Bart Baesens,et al.  Risk Management and Regulatory Compliance: A Data Mining Framework Based on Neural Network Rule Extraction , 2006, ICIS.

[42]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001 .

[43]  Bart Baesens,et al.  Predicting going concern opinion with data mining , 2008, Decis. Support Syst..

[44]  Bart Baesens,et al.  Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[45]  Olcay Boz,et al.  Converting A Trained Neural Network To a Decision Tree DecText - Decision Tree Extractor , 2002, ICMLA.

[46]  H. Czichos,et al.  Springer Handbook of Metrology and Testing , 2011 .

[47]  M. Aalabaf-Sabaghi Credit Risk Management: Basic Concepts, Financial Risk Components, Rating Analysis, Models, Economic and Regulatory Capital , 2009 .

[48]  Glenn Fung,et al.  Rule extraction from linear support vector machines , 2005, KDD '05.

[49]  Mark Craven,et al.  Rule Extraction: Where Do We Go from Here? , 1999 .

[50]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[51]  Fei Chen,et al.  LEARNING ACCURATE AND UNDERSTANDABLE RULES FROM SVM CLASSIFIERS , 2004 .

[52]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[53]  Vadlamani Ravi,et al.  Support vector regression based hybrid rule extraction methods for forecasting , 2010, Expert Syst. Appl..

[54]  Gerrit van Bruggen,et al.  How Incorporating Feedback Mechanisms in a DSS Affects DSS Evaluations , 2009, Inf. Syst. Res..

[55]  Jude W. Shavlik,et al.  in Advances in Neural Information Processing , 1996 .

[56]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[57]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[58]  Andreu Català,et al.  Rule-Based Learning Systems for Support Vector Machines , 2006, Neural Processing Letters.

[59]  Daniel Rivero,et al.  A New Approach to the Extraction of ANN Rules and to Their Generalization Capacity Through GP , 2004, Neural Computation.

[60]  Christopher M. Bishop,et al.  Neural networks: a pattern recognition perspective , 1996 .

[61]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery in Databases , 1996, AI Mag..

[62]  Wei Chu,et al.  Multi-category Classification by Soft-Max Combination of Binary Classifiers , 2003, Multiple Classifier Systems.