Interpretability via Model Extraction

The ability to interpret machine learning models has become increasingly important now that machine learning is used to inform consequential decisions. We propose an approach called model extraction for interpreting complex, blackbox models. Our approach approximates the complex model using a much more interpretable model; as long as the approximation quality is good, then statistical properties of the complex model are reflected in the interpretable model. We show how model extraction can be used to understand and debug random forests and neural nets trained on several datasets from the UCI Machine Learning Repository, as well as control policies learned for several classical reinforcement learning problems.

[1]  Margo I. Seltzer,et al.  Learning Certifiably Optimal Rule Lists , 2017, KDD.

[2]  Finale Doshi-Velez,et al.  A Roadmap for a Rigorous Science of Interpretability , 2017, ArXiv.

[3]  Sameer Singh,et al.  “Why Should I Trust You?”: Explaining the Predictions of Any Classifier , 2016, NAACL.

[4]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[5]  Johannes Gehrke,et al.  Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission , 2015, KDD.

[6]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[7]  Jure Leskovec,et al.  Human Decisions and Machine Predictions , 2017, The quarterly journal of economics.

[8]  Igor Kononenko,et al.  Machine learning for medical diagnosis: history, state of the art and perspective , 2001, Artif. Intell. Medicine.

[9]  Cynthia Rudin,et al.  Falling Rule Lists , 2014, AISTATS.

[10]  Nathan Srebro,et al.  Equality of Opportunity in Supervised Learning , 2016, NIPS.

[11]  Filip De Turck,et al.  GENESIM: genetic extraction of a single, interpretable model , 2016, NIPS 2016.

[12]  Cynthia Rudin,et al.  Algorithms for interpretable machine learning , 2014, KDD.

[13]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[14]  Houtao Deng,et al.  Interpreting tree ensembles with inTrees , 2018, International Journal of Data Science and Analytics.

[15]  Johannes Gehrke,et al.  Intelligible models for classification and regression , 2012, KDD.

[16]  Cynthia Rudin,et al.  Supersparse linear integer models for optimized medical scoring systems , 2015, Machine Learning.

[17]  Franco Turini,et al.  Discrimination-aware data mining , 2008, KDD.

[18]  Hendrik Blockeel,et al.  Seeing the Forest Through the Trees , 2007, ILP.

[19]  D. Goldstein,et al.  Simple Rules for Complex Decisions , 2017, 1702.04690.

[20]  Leslie Pack Kaelbling,et al.  Collision Avoidance for Unmanned Aircraft using Markov Decision Processes , 2010 .

[21]  L. Ungar,et al.  MediBoost: a Patient Stratification Tool for Interpretable Decision Making in the Era of Precision Medicine , 2016, Scientific Reports.

[22]  Berkeley J. Dietvorst,et al.  Algorithm Aversion: People Erroneously Avoid Algorithms after Seeing Them Err , 2014, Journal of experimental psychology. General.

[23]  Cynthia Rudin,et al.  Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model , 2015, ArXiv.

[24]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[25]  Paulo Cortez,et al.  Using data mining to predict secondary school student performance , 2008 .

[26]  Rich Caruana,et al.  Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[27]  O. Mangasarian,et al.  Multisurface method of pattern separation for medical diagnosis applied to breast cytology. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.