Genetic rule extraction optimizing brier score

Most highly accurate predictive modeling techniques produce opaque models. When comprehensible models are required, rule extraction is sometimes used to generate a transparent model, based on the opaque. Naturally, the extracted model should be as similar as possible to the opaque. This criterion, called fidelity, is therefore a key part of the optimization function in most rule extracting algorithms. To the best of our knowledge, all existing rule extraction algorithms targeting fidelity use 0/1 fidelity, i.e., maximize the number of identical classifications. In this paper, we suggests and evaluate a rule extraction algorithm utilizing a more informed fidelity criterion. More specifically, the novel algorithms, which is based on genetic programming, minimizes the difference in probability estimates between the extracted and the opaque models, by using the generalized Brier score as fitness function. Experimental results from 26 UCI data sets show that the suggested algorithm obtained considerably higher accuracy and significantly better AUC than both the exact same rule extraction algorithm maximizing 0/1 fidelity, and the standard tree inducer J48. Somewhat surprisingly, rule extraction using the more informed fidelity metric normally resulted in less complex models, making sure that the improved predictive performance was not achieved on the expense of comprehensibility.

[1]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[2]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[3]  LiMin Fu,et al.  Rule Learning by Searching on Adapted Nets , 1991, AAAI.

[4]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[5]  Ian H. Witten,et al.  Data mining - practical machine learning tools and techniques, Second Edition , 2005, The Morgan Kaufmann series in data management systems.

[6]  Ian Witten,et al.  Data Mining , 2000 .

[7]  Bart Baesens,et al.  Using Rule Extraction to Improve the Comprehensibility of Predictive Models , 2006 .

[8]  Hongjun Lu,et al.  NeuroRule: A Connectionist Approach to Data Mining , 1995, VLDB.

[9]  Jude W. Shavlik,et al.  in Advances in Neural Information Processing , 1996 .

[10]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[11]  Lars Niklasson,et al.  Inconsistency - Friend or Foe , 2007, 2007 International Joint Conference on Neural Networks.

[12]  Sebastian Thrun,et al.  Extracting Rules from Artifical Neural Networks with Distributed Representations , 1994, NIPS.

[13]  Ulf Johansson,et al.  Obtaining Accurate and Comprehensible Data Mining Models: An Evolutionary Approach , 2007 .

[14]  Jude W. Shavlik,et al.  Using neural networks for data mining , 1997, Future Gener. Comput. Syst..

[15]  Joachim Diederich,et al.  Survey and critique of techniques for extracting rules from trained artificial neural networks , 1995, Knowl. Based Syst..

[16]  Mark Craven,et al.  Rule Extraction: Where Do We Go from Here? , 1999 .

[17]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[18]  Jude W. Shavlik,et al.  Extracting Refined Rules from Knowledge-Based Neural Networks , 1993, Machine Learning.

[19]  Jude Shavlik,et al.  THE EXTRACTION OF REFINED RULES FROM KNOWLEDGE BASED NEURAL NETWORKS , 1993 .

[20]  G. Brier VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[21]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[22]  Tom Fawcett,et al.  Using rule sets to maximize ROC performance , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[23]  Urszula Markowska-Kaczmar,et al.  Discovering the Mysteries of Neural Networks , 2004, Int. J. Hybrid Intell. Syst..