Explainable decision forest: Transforming a decision forest into an interpretable tree

Abstract Decision forests are considered the best practice in many machine learning challenges, mainly due to their superior predictive performance. However, simple models like decision trees may be preferred over decision forests in cases in which the generated predictions must be efficient or interpretable (e.g. in insurance or health-related use cases). This paper presents a novel method for transforming a decision forest into an interpretable decision tree, which aims at preserving the predictive performance of decision forests while enabling efficient classifications that can be understood by humans. This is done by creating a set of rule conjunctions that represent the original decision forest; the conjunctions are then hierarchically organized to form a new decision tree. We evaluate the proposed method on 33 UCI datasets and show that the resulting model usually approximates the ROC AUC gained by random forest while providing an interpretable decision path for each classification.

[1]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[2]  Lior Rokach,et al.  Collective-agreement-based pruning of ensembles , 2009, Comput. Stat. Data Anal..

[3]  Mariana Belgiu,et al.  Random forest in remote sensing: A review of applications and future directions , 2016 .

[4]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[5]  Yang Yu,et al.  Pareto Ensemble Pruning , 2015, AAAI.

[6]  J. Ross Quinlan,et al.  Generating Production Rules from Decision Trees , 1987, IJCAI.

[7]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[8]  R. Tibshirani,et al.  Prototype selection for interpretable classification , 2011, 1202.5933.

[9]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[10]  Franco Turini,et al.  A Survey of Methods for Explaining Black Box Models , 2018, ACM Comput. Surv..

[11]  William Nick Street,et al.  Ensemble Pruning Via Semi-definite Programming , 2006, J. Mach. Learn. Res..

[12]  Wei Tang,et al.  Selective Ensemble of Decision Trees , 2003, RSFDGrC.

[13]  Qinghua Hu,et al.  EROS: Ensemble rough subspaces , 2007, Pattern Recognit..

[14]  Kagan Tumer,et al.  Classifier ensembles: Select real-world applications , 2008, Inf. Fusion.

[15]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[16]  Rich Caruana,et al.  Ensemble selection from libraries of models , 2004, ICML.

[17]  Marko Bohanec,et al.  Explaining machine learning models in sales predictions , 2017, Expert Syst. Appl..

[18]  Hussein Almuallim,et al.  Turning majority voting classifiers into a single decision tree , 1998, Proceedings Tenth IEEE International Conference on Tools with Artificial Intelligence (Cat. No.98CH36294).

[19]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[20]  Bart Baesens,et al.  An empirical evaluation of the comprehensibility of decision table, tree and rule based predictive models , 2011, Decis. Support Syst..

[21]  Pedro M. Domingos Knowledge Discovery Via Multiple Models , 1998, Intell. Data Anal..

[22]  Anton van den Hengel,et al.  Pedestrian Detection with Spatially Pooled Features and Structured Ensemble Learning , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Cynthia Rudin,et al.  Falling Rule Lists , 2014, AISTATS.

[24]  Peter Clark,et al.  The CN2 Induction Algorithm , 1989, Machine Learning.

[25]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[26]  Senén Barro,et al.  Do we need hundreds of classifiers to solve real world classification problems? , 2014, J. Mach. Learn. Res..

[27]  Filip De Turck,et al.  GENESIM: genetic extraction of a single, interpretable model , 2016, NIPS 2016.

[28]  Chang-an Wu,et al.  Forest Pruning Based on Branch Importance , 2017, Comput. Intell. Neurosci..

[29]  Lior Rokach,et al.  Ensemble learning: A survey , 2018, WIREs Data Mining Knowl. Discov..

[30]  C. Apte,et al.  Data mining with decision trees and decision rules , 1997, Future Gener. Comput. Syst..

[31]  Grigorios Tsoumakas,et al.  Focused Ensemble Selection: A Diversity-Based Method for Greedy Ensemble Selection , 2008, ECAI.

[32]  Lior Rokach,et al.  Decision forest: Twenty years of research , 2016, Inf. Fusion.

[33]  Ivan Bratko,et al.  Machine Learning: Between Accuracy and Interpretability , 1997 .

[34]  Bernhard Sendhoff,et al.  Pareto-Based Multiobjective Machine Learning: An Overview and Case Studies , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[35]  B. Roe,et al.  Boosted decision trees as an alternative to artificial neural networks for particle identification , 2004, physics/0408124.

[36]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[37]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[38]  Andrea Vedaldi,et al.  Interpretable Explanations of Black Boxes by Meaningful Perturbation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[39]  D. Geman,et al.  Randomized Inquiries About Shape: An Application to Handwritten Digit Recognition. , 1994 .

[40]  Sylvio Barbon Junior,et al.  Predicting the ripening of papaya fruit with digital imaging and random forests , 2018, Comput. Electron. Agric..

[41]  David J. Hand,et al.  A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems , 2001, Machine Learning.

[42]  Hendrik Blockeel,et al.  Seeing the Forest Through the Trees: Learning a Comprehensible Model from an Ensemble , 2007, ECML.

[43]  Tal Z. Zarsky,et al.  Incompatible: The GDPR in the Age of Big Data , 2017 .

[44]  Alex Alves Freitas,et al.  Improving the interpretability of classification rules discovered by an ant colony algorithm , 2013, GECCO '13.

[45]  Philip S. Yu,et al.  Pruning and dynamic scheduling of cost-sensitive ensembles , 2002, AAAI/IAAI.

[46]  Muttukrishnan Rajarajan,et al.  PIndroid: A novel Android malware detection system using ensemble learning , 2017 .

[47]  Bill Howe,et al.  DataSynthesizer: Privacy-Preserving Synthetic Datasets , 2017, SSDBM.

[48]  Sotiris B. Kotsiantis,et al.  Decision trees: a recent overview , 2011, Artificial Intelligence Review.

[49]  Ankur Teredesai,et al.  Interpretable Machine Learning in Healthcare , 2018, BCB.

[50]  Amina Adadi,et al.  Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI) , 2018, IEEE Access.

[51]  Rich Caruana,et al.  Model compression , 2006, KDD '06.

[52]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[53]  Alex Alves Freitas,et al.  Comprehensible classification models: a position paper , 2014, SKDD.

[54]  Martin Mozina,et al.  Orange: data mining toolbox in python , 2013, J. Mach. Learn. Res..

[55]  Xiangliang Zhang,et al.  An up-to-date comparison of state-of-the-art classification algorithms , 2017, Expert Syst. Appl..

[56]  Cynthia Rudin,et al.  Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model , 2015, ArXiv.

[57]  Lior Rokach,et al.  Data Mining with Decision Trees - Theory and Applications , 2007, Series in Machine Perception and Artificial Intelligence.

[58]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[59]  Erik Strumbelj,et al.  Explaining prediction models and individual predictions with feature contributions , 2014, Knowledge and Information Systems.

[60]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[61]  Rutvija Pandya,et al.  C5.0 Algorithm to Improved Decision Tree with Feature Selection and Reduced Error Pruning , 2015 .

[62]  Sanjay Ranka,et al.  Global Model Interpretation Via Recursive Partitioning , 2018, 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS).

[63]  Lei Wang,et al.  Comparison of random forest, artificial neural networks and support vector machine for intelligent diagnosis of rotating machinery , 2018, Trans. Inst. Meas. Control.

[64]  Jude W. Shavlik,et al.  in Advances in Neural Information Processing , 1996 .