Interpreting Blackbox Models via Model Extraction

Interpretability has become incredibly important as machine learning is increasingly used to inform consequential decisions. We propose to construct global explanations of complex, blackbox models in the form of a decision tree approximating the original model---as long as the decision tree is a good approximation, then it mirrors the computation performed by the blackbox model. We devise a novel algorithm for extracting decision tree explanations that actively samples new training points to avoid overfitting. We evaluate our algorithm on a random forest to predict diabetes risk and a learned controller for cart-pole. Compared to several baselines, our decision trees are both substantially more accurate and equally or more interpretable based on a user study. Finally, we describe several insights provided by our interpretations, including a causal issue validated by a physician.

[1]  Margo I. Seltzer,et al.  Learning Certifiably Optimal Rule Lists , 2017, KDD.

[2]  Cynthia Rudin,et al.  Falling Rule Lists , 2014, AISTATS.

[3]  Nathan Srebro,et al.  Equality of Opportunity in Supervised Learning , 2016, NIPS.

[4]  Jure Leskovec,et al.  Interpretable & Explorable Approximations of Black Box Models , 2017, ArXiv.

[5]  Filip De Turck,et al.  GENESIM: genetic extraction of a single, interpretable model , 2016, NIPS 2016.

[6]  Geoffrey E. Hinton,et al.  Distilling a Neural Network Into a Soft Decision Tree , 2017, CEx@AI*IA.

[7]  Finale Doshi-Velez,et al.  A Roadmap for a Rigorous Science of Interpretability , 2017, ArXiv.

[8]  Houtao Deng,et al.  Interpreting tree ensembles with inTrees , 2018, International Journal of Data Science and Analytics.

[9]  Anna Shcherbina,et al.  Not Just a Black Box: Learning Important Features Through Propagating Activation Differences , 2016, ArXiv.

[10]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[11]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[12]  Shinichi Morishita,et al.  On Classification and Regression , 1998, Discovery Science.

[13]  L. Ungar,et al.  MediBoost: a Patient Stratification Tool for Interpretable Decision Making in the Era of Precision Medicine , 2016, Scientific Reports.

[14]  T. Stamey,et al.  Prostate specific antigen in the diagnosis and treatment of adenocarcinoma of the prostate. II. Radical prostatectomy treated patients. , 1989, The Journal of urology.

[15]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[16]  O. Mangasarian,et al.  Multisurface method of pattern separation for medical diagnosis applied to breast cytology. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Andrew W. Moore,et al.  Efficient memory-based learning for robot control , 1990 .

[18]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[19]  Paulo Cortez,et al.  Using data mining to predict secondary school student performance , 2008 .

[20]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[21]  J. Ross Quinlan,et al.  Combining Instance-Based and Model-Based Learning , 1993, ICML.

[22]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[23]  Yair Zick,et al.  Algorithmic Transparency via Quantitative Input Influence: Theory and Experiments with Learning Systems , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[24]  Rich Caruana,et al.  Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[25]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[26]  Johannes Gehrke,et al.  Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission , 2015, KDD.

[27]  Jure Leskovec,et al.  Human Decisions and Machine Predictions , 2017, The quarterly journal of economics.

[28]  L. Breiman,et al.  BORN AGAIN TREES , 1996 .

[29]  Berkeley J. Dietvorst,et al.  Algorithm Aversion: People Erroneously Avoid Algorithms after Seeing Them Err , 2014, Journal of experimental psychology. General.

[30]  Franco Turini,et al.  Discrimination-aware data mining , 2008, KDD.

[31]  Cynthia Rudin,et al.  Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model , 2015, ArXiv.

[32]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[33]  Johannes Gehrke,et al.  Intelligible models for classification and regression , 2012, KDD.

[34]  Jure Leskovec,et al.  Interpretable Decision Sets: A Joint Framework for Description and Prediction , 2016, KDD.

[35]  Cynthia Rudin,et al.  Algorithms for interpretable machine learning , 2014, KDD.

[36]  Neil D. Lawrence,et al.  Dataset Shift in Machine Learning , 2009 .

[37]  Gilles Louppe,et al.  Independent consultant , 2013 .

[38]  Igor Kononenko,et al.  Machine learning for medical diagnosis: history, state of the art and perspective , 2001, Artif. Intell. Medicine.

[39]  D. Goldstein,et al.  Simple Rules for Complex Decisions , 2017, 1702.04690.

[40]  Rich Caruana,et al.  Model compression , 2006, KDD '06.

[41]  Leslie Pack Kaelbling,et al.  Collision Avoidance for Unmanned Aircraft using Markov Decision Processes , 2010 .

[42]  Cynthia Rudin,et al.  Supersparse linear integer models for optimized medical scoring systems , 2015, Machine Learning.

[43]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[44]  Margo I. Seltzer,et al.  Scalable Bayesian Rule Lists , 2016, ICML.

[45]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[46]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[47]  Scott Lundberg,et al.  An unexpected unity among methods for interpreting model predictions , 2016, ArXiv.

[48]  Hendrik Blockeel,et al.  Seeing the Forest Through the Trees , 2007, ILP.

[49]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[50]  H. Altay Güvenir,et al.  Learning differential diagnosis of erythemato-squamous diseases using voting feature intervals , 1998, Artif. Intell. Medicine.