Tree Space Prototypes: Another Look at Making Tree Ensembles Interpretable

Ensembles of decision trees perform well on many problems, but are not interpretable. In contrast to existing approaches in interpretability that focus on explaining relationships between features and predictions, we propose an alternative approach to interpret tree ensemble classifiers by surfacing representative points for each class -- prototypes. We introduce a new distance for Gradient Boosted Tree models, and propose new, adaptive prototype selection methods with theoretical guarantees, with the flexibility to choose a different number of prototypes in each class. We demonstrate our methods on random forests and gradient boosted trees, showing that the prototypes can perform as well as or even better than the original tree ensemble when used as a nearest-prototype classifier. In a user study, humans were better at predicting the output of a tree ensemble classifier when using prototypes than when using Shapley values, a popular feature attribution method. Hence, prototypes present a viable alternative to feature-based explanations for tree ensembles.

[1]  Michael M. Richter,et al.  Case-based reasoning foundations , 2005, The Knowledge Engineering Review.

[2]  R. Tibshirani,et al.  Prototype selection for interpretable classification , 2011, 1202.5933.

[3]  Ben Green,et al.  Disparate Interactions: An Algorithm-in-the-Loop Analysis of Fairness in Risk Assessments , 2019, FAT.

[4]  D. Stekhoven missForest: Nonparametric missing value imputation using random forest , 2015 .

[5]  Juanjuan Fan,et al.  Propensity score and proximity matching using random forest. , 2016, Contemporary clinical trials.

[6]  Oluwasanmi Koyejo,et al.  Examples are not enough, learn to criticize! Criticism for Interpretability , 2016, NIPS.

[7]  John David N. Dionisio,et al.  Case-based explanation of non-case-based learning methods , 1999, AMIA.

[8]  Joachim M. Buhmann,et al.  The Balanced Accuracy and Its Posterior Distribution , 2010, 2010 20th International Conference on Pattern Recognition.

[9]  Hugh Chen,et al.  From local explanations to global understanding with explainable AI for trees , 2020, Nature Machine Intelligence.

[10]  Jonathan E. Fieldsend,et al.  Confident Interpretation of Bayesian Decision Tree Ensembles for Clinical Applications , 2007, IEEE Transactions on Information Technology in Biomedicine.

[11]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[12]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[13]  Chris Russell,et al.  Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR , 2017, ArXiv.

[14]  Gilles Louppe,et al.  Understanding Random Forests: From Theory to Practice , 2014, 1407.7502.

[15]  C. G. Hilborn,et al.  The Condensed Nearest Neighbor Rule , 1967 .

[16]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[17]  Stefan Wager,et al.  Estimation and Inference of Heterogeneous Treatment Effects using Random Forests , 2015, Journal of the American Statistical Association.

[18]  Yi Lin,et al.  Random Forests and Adaptive Nearest Neighbors , 2006 .

[19]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[20]  Ankur Taly,et al.  Explainable machine learning in deployment , 2019, FAT*.

[21]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[22]  Giles Hooker,et al.  Discovering additive structure in black box functions , 2004, KDD.

[23]  Vivian Lai,et al.  On Human Predictions with Explanations and Predictions of Machine Learning Models: A Case Study on Deception Detection , 2018, FAT.

[24]  Giles Hooker,et al.  Interpreting Models via Single Tree Approximation , 2016, 1610.09036.

[25]  Amar Cheema,et al.  Data collection in a flat world: the strengths and weaknesses of mechanical turk samples , 2013 .

[26]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[27]  Hui Lin,et al.  A Class of Submodular Functions for Document Summarization , 2011, ACL.

[28]  Cynthia Rudin,et al.  The Bayesian Case Model: A Generative Approach for Case-Based Reasoning and Prototype Classification , 2014, NIPS.

[29]  S. Horvath,et al.  Unsupervised Learning With Random Forest Predictors , 2006 .

[30]  Fan Yang,et al.  Two approaches for novelty detection using random forest , 2015, Expert Syst. Appl..

[31]  Cynthia Rudin,et al.  Deep Learning for Case-based Reasoning through Prototypes: A Neural Network that Explains its Predictions , 2017, AAAI.

[32]  Satoshi Hara,et al.  Making Tree Ensembles Interpretable , 2016, 1606.05390.

[33]  Peter J. Rousseeuw,et al.  Clustering by means of medoids , 1987 .

[34]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[35]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[36]  Oluwasanmi Koyejo,et al.  Interpreting Black Box Predictions using Fisher Kernels , 2018, AISTATS.

[37]  Satoshi Hara,et al.  Making Tree Ensembles Interpretable: A Bayesian Model Selection Approach , 2016, AISTATS.

[38]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[39]  Pradeep Ravikumar,et al.  Representer Point Selection for Explaining Deep Neural Networks , 2018, NeurIPS.

[40]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[41]  Andreas Krause,et al.  Distributed Submodular Maximization: Identifying Representative Elements in Massive Data , 2013, NIPS.

[42]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[43]  Dimitrios Gunopulos,et al.  Locally Adaptive Metric Nearest-Neighbor Classification , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[44]  H. Ishwaran Variable importance in binary regression trees and forests , 2007, 0711.2434.

[45]  Erwan Scornet,et al.  Random Forests and Kernel Methods , 2015, IEEE Transactions on Information Theory.

[46]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[47]  Yixin Chen,et al.  Optimal Action Extraction for Random Forests and Boosted Trees , 2015, KDD.

[48]  Francisco Herrera,et al.  Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[50]  Zachary Chase Lipton The mythos of model interpretability , 2016, ACM Queue.

[51]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[52]  Andreas Krause,et al.  Budgeted Nonparametric Learning from Data Streams , 2010, ICML.

[53]  Giles Hooker,et al.  Detecting Feature Interactions in Bagged Trees and Random Forests , 2014, 1406.1845.

[54]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[55]  Himabindu Lakkaraju,et al.  "How do I fool you?": Manipulating User Trust via Misleading Black Box Explanations , 2019, AIES.

[56]  Alex Alves Freitas,et al.  Comprehensible classification models: a position paper , 2014, SKDD.

[57]  Christos H. Papadimitriou,et al.  Worst-Case and Probabilistic Analysis of a Geometric Location Problem , 1981, SIAM J. Comput..

[58]  Peter E. Hart,et al.  The condensed nearest neighbor rule (Corresp.) , 1968, IEEE Trans. Inf. Theory.

[59]  Daniel G. Goldstein,et al.  Manipulating and Measuring Model Interpretability , 2018, CHI.

[60]  Hany Farid,et al.  The accuracy, fairness, and limits of predicting recidivism , 2018, Science Advances.

[61]  G. Hooker Generalized Functional ANOVA Diagnostics for High-Dimensional Functions of Dependent Variables , 2007 .

[62]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[63]  Ran Xu,et al.  Random forests for metric learning with implicit pairwise position dependence , 2012, KDD.

[64]  Carlos Guestrin,et al.  Model-Agnostic Interpretability of Machine Learning , 2016, ArXiv.