Rule Extraction from Decision Trees Ensembles: New Algorithms Based on Heuristic Search and Sparse Group Lasso Methods

Decision trees are examples of easily interpretable models whose predictive accuracy is normally low. In comparison, decision tree ensembles (DTEs) such as random forest (RF) exhibit high predictive accuracy while being regarded as black-box models. We propose three new rule extraction algorithms from DTEs. The RF+DHC method, a hill climbing method with downhill moves (DHC), is used to search for a rule set that decreases the number of rules dramatically. In the RF+SGL and RF+MSGL methods, the sparse group lasso (SGL) method, and the multiclass SGL (MSGL) method are employed respectively to find a sparse weight vector corresponding to the rules generated by RF. Experimental results with 24 data sets show that the proposed methods outperform similar state-of-the-art methods, in terms of human comprehensibility, by greatly reducing the number of rules and limiting the number of antecedents in the retained rules, while preserving the same level of accuracy.

[1]  R. Fisher XV.—The Correlation between Relatives on the Supposition of Mendelian Inheritance. , 1919, Transactions of the Royal Society of Edinburgh.

[2]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[3]  Joachim Diederich,et al.  Survey and critique of techniques for extracting rules from trained artificial neural networks , 1995, Knowl. Based Syst..

[4]  Victor J. Rayward-Smith,et al.  Modern Heuristic Search Methods , 1996 .

[5]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[6]  Sankar K. Pal,et al.  Rough fuzzy MLP: knowledge encoding and classification , 1998, IEEE Trans. Neural Networks.

[7]  Pedro M. Domingos Knowledge Discovery Via Multiple Models , 1998, Intell. Data Anal..

[8]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[9]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Guido Bologna,et al.  A Study on Rule Extraction from Several Combined Neural Networks , 2001, Int. J. Neural Syst..

[11]  Wlodzislaw Duch,et al.  A new methodology of extraction, optimization and application of crisp and fuzzy logical rules , 2001, IEEE Trans. Neural Networks.

[12]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[13]  T. Golub,et al.  Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. , 2003, Cancer research.

[14]  Bart Baesens,et al.  Using Neural Network Rule Extraction and Decision Tables for Credit - Risk Evaluation , 2003, Manag. Sci..

[15]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[16]  Xin Yao,et al.  Diversity creation methods: a survey and categorisation , 2004, Inf. Fusion.

[17]  Glenn Fung,et al.  Rule extraction from linear support vector machines , 2005, KDD '05.

[18]  Bart Baesens,et al.  Using Rule Extraction to Improve the Comprehensibility of Predictive Models , 2006 .

[19]  Hua Yang,et al.  Searching for interpretable rules for disease mutations: a simulated annealing bump hunting strategy , 2006, BMC Bioinformatics.

[20]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[21]  Gonzalo Martínez-Muñoz,et al.  Pruning in ordered bagging ensembles , 2006, ICML.

[22]  B. Selman,et al.  Hill‐climbing Search , 2006 .

[23]  Bart Baesens,et al.  Comprehensible Credit Scoring Models Using Rule Extraction from Support Vector Machines , 2007, Eur. J. Oper. Res..

[24]  Tony R. Martinez,et al.  Decision Tree Ensemble: Small Heterogeneous Is Better Than Large Homogeneous , 2008, 2008 Seventh International Conference on Machine Learning and Applications.

[25]  Bart Baesens,et al.  Minerva: Sequential Covering for Rule Extraction , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[26]  Bogdan E. Popescu,et al.  PREDICTIVE LEARNING VIA RULE ENSEMBLES , 2008, 0811.1679.

[27]  Daniel Hernández-Lobato,et al.  An Analysis of Ensemble Pruning Techniques Based on Ordered Aggregation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Roberto J. Bayardo,et al.  PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce , 2009, Proc. VLDB Endow..

[29]  N. Meinshausen Node harvest: simple and interpretable regression and classication , 2009, 0910.2145.

[30]  Andrew P. Bradley,et al.  Rule extraction from support vector machines: A review , 2010, Neurocomputing.

[31]  Jacek M. Zurada,et al.  Extracting Rules From Neural Networks as Decision Diagrams , 2011, IEEE Transactions on Neural Networks.

[32]  Yixin Chen,et al.  Multi-class Joint Rule Extraction and Feature Selection for Biological Data , 2011, 2011 IEEE International Conference on Bioinformatics and Biomedicine.

[33]  Sheng Liu,et al.  Combined Rule Extraction and Feature Elimination in Supervised Classification , 2012, IEEE Transactions on NanoBioscience.

[34]  José Augusto Baranauskas,et al.  How Many Trees in a Random Forest? , 2012, MLDM.

[35]  Fan Yang,et al.  Margin optimization based pruning for random forest , 2012, Neurocomputing.

[36]  Qinghua Hu,et al.  Rule extraction from support vector machines based on consistent region covering reduction , 2013, Knowl. Based Syst..

[37]  Noah Simon,et al.  A Sparse-Group Lasso , 2013 .

[38]  Robin Gras,et al.  A machine learning approach to investigate the reasons behind species extinction , 2014, Ecol. Informatics.

[39]  R. Gras,et al.  Species–area relationship and a tentative interpretation of the function coefficients in an ecosystem simulation , 2014 .

[40]  Robin Gras,et al.  Rule Extraction from Random Forest: the RF+HC Methods , 2015, Canadian Conference on AI.

[41]  Guanying Wang,et al.  A new method for constructing granular neural networks based on rule extraction and extreme learning machine , 2015, Pattern Recognit. Lett..