Definitions, methods, and applications in interpretable machine learning

M learning (ML) has recently received considerable attention for its ability to accurately predict a wide variety of complex phenomena. However, there is a growing realization that, in addition to predictions, ML models are capable of producing knowledge about domain relationships contained in data, often referred to as interpretations. These interpretations have found uses both in their own right, e.g. medicine (1), policy-making (2), and science (3, 4), as well as in auditing the predictions themselves in response to issues such as regulatory pressure (5) and fairness (6). In the absence of a well-formed definition of interpretability, a broad range of methods with a correspondingly broad range of outputs (e.g. visualizations, natural language, mathematical equations) have been labeled as interpretation. This has led to considerable confusion about the notion of interpretability. In particular, it is unclear what it means to interpret something, what common threads exist among disparate methods, and how to select an interpretation method for a particular problem/audience. In this paper, we attempt to address these concerns. To do so, we first define interpretability in the context of machine learning and place it within a generic data science life cycle. This allows us to distinguish between two main classes of interpretation methods: model-based∗ and post hoc. We then introduce the Predictive, Descriptive, Relevant (PDR) framework, consisting of three desiderata for evaluating and constructing interpretations: predictive accuracy, descriptive

[1]  Bin Yu,et al.  Structural Compression of Convolutional Neural Networks Based on Greedy Filter Pruning , 2017, ArXiv.

[2]  Arthur Szlam,et al.  Automatic Rule Extraction from Long Short Term Memory Networks , 2016, ICLR.

[3]  Y. C. Pati,et al.  Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition , 1993, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers.

[4]  Yarin Gal,et al.  Real Time Image Saliency for Black Box Classifiers , 2017, NIPS.

[5]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[6]  Quanshi Zhang,et al.  Interpreting CNN knowledge via an Explanatory Graph , 2017, AAAI.

[7]  Haiyan Huang,et al.  Biclustering by sparse canonical correlation analysis , 2018, Quantitative Biology.

[8]  Russell G. Death,et al.  An accurate comparison of methods for quantifying variable importance in artificial neural networks using simulated data , 2004 .

[9]  T. Lombrozo The structure and function of explanations , 2006, Trends in Cognitive Sciences.

[10]  Nathan Srebro,et al.  Equality of Opportunity in Supervised Learning , 2016, NIPS.

[11]  John David N. Dionisio,et al.  Case-based explanation of non-case-based learning methods , 1999, AMIA.

[12]  Franco Turini,et al.  A Survey of Methods for Explaining Black Box Models , 2018, ACM Comput. Surv..

[13]  G. Box Science and Statistics , 1976 .

[14]  Christine D. Piatko,et al.  Using “Annotator Rationales” to Improve Machine Learning for Text Categorization , 2007, NAACL.

[15]  Seth Flaxman,et al.  European Union Regulations on Algorithmic Decision-Making and a "Right to Explanation" , 2016, AI Mag..

[16]  Cedric E. Ginestet ggplot2: Elegant Graphics for Data Analysis , 2011 .

[17]  Bolei Zhou,et al.  Understanding Intra-Class Knowledge Inside CNN , 2015, ArXiv.

[18]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[19]  Michael Tsang,et al.  Can I trust you more? Model-Agnostic Hierarchical Explanations , 2018, ArXiv.

[20]  Yung-Seop Lee,et al.  Enriched random forests , 2008, Bioinform..

[21]  John Law,et al.  Robust Statistics—The Approach Based on Influence Functions , 1986 .

[22]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[23]  Trevor Darrell,et al.  Grounding of Textual Phrases in Images by Reconstruction , 2015, ECCV.

[24]  Bin Yu,et al.  Estimation Stability With Cross-Validation (ESCV) , 2013, 1303.3128.

[25]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[26]  Andrew Slavin Ross,et al.  Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations , 2017, IJCAI.

[27]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[28]  Guillermo Sapiro,et al.  A Shared Vision for Machine Learning in Neuroscience , 2018, The Journal of Neuroscience.

[29]  Thomas Lengauer,et al.  Permutation importance: a corrected feature importance measure , 2010, Bioinform..

[30]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[31]  James B. Brown,et al.  Iterative random forests to discover predictive and stable high-order interactions , 2017, Proceedings of the National Academy of Sciences.

[32]  Bin Yu,et al.  Superheat: Supervised heatmaps for visualizing complex data , 2015 .

[33]  Mu-Chen Chen,et al.  Credit scoring with a data mining approach based on support vector machines , 2007, Expert Syst. Appl..

[34]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[35]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[36]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[37]  Jack L. Gallant,et al.  The DeepTune framework for modeling and characterizing neurons in visual cortex area V4 , 2018, bioRxiv.

[38]  Shafi Goldwasser,et al.  Proceedings of the 3rd Innovations in Theoretical Computer Science Conference , 2012 .

[39]  Johannes Gehrke,et al.  Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission , 2015, KDD.

[40]  Patrick D. McDaniel,et al.  Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep Learning , 2018, ArXiv.

[41]  Yan Liu,et al.  Detecting Statistical Interactions from Neural Network Weights , 2017, ICLR.

[42]  O. Stegle,et al.  Deep learning for computational biology , 2016, Molecular systems biology.

[43]  Dan Klein,et al.  Neural Module Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Bin Yu,et al.  Refining interaction search through signed iterative Random Forests , 2018, bioRxiv.

[45]  Alexander M. Rush,et al.  Visual Analysis of Hidden State Dynamics in Recurrent Neural Networks , 2016, ArXiv.

[46]  Bevil R. Conway,et al.  Toward a Unified Theory of Visual Area V4 , 2012, Neuron.

[47]  David R. Anderson,et al.  Multimodel Inference , 2004 .

[48]  Deborah Silver,et al.  Feature Visualization , 1994, Scientific Visualization.

[49]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[50]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[51]  Finale Doshi-Velez,et al.  A Roadmap for a Rigorous Science of Interpretability , 2017, ArXiv.

[52]  R. Tibshirani,et al.  Generalized additive models for medical research , 1986, Statistical methods in medical research.

[53]  Regina Barzilay,et al.  Rationalizing Neural Predictions , 2016, EMNLP.

[54]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[55]  Juan Enrique Ramos,et al.  Using TF-IDF to Determine Word Relevance in Document Queries , 2003 .

[56]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[57]  Chandan Singh,et al.  Hierarchical interpretations for neural network predictions , 2018, ICLR.

[58]  Bin Yu,et al.  Daytime Arctic Cloud Detection Based on Multi-Angle Satellite Data With Case Studies , 2008 .

[59]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001 .

[60]  Brian E. Granger,et al.  IPython: A System for Interactive Scientific Computing , 2007, Computing in Science & Engineering.

[61]  Anna Shcherbina,et al.  Not Just a Black Box: Learning Important Features Through Propagating Activation Differences , 2016, ArXiv.

[62]  D. Freedman Statistical models and shoe leather , 1989 .

[63]  Cynthia Rudin,et al.  Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model , 2015, ArXiv.

[64]  Bogdan E. Popescu,et al.  PREDICTIVE LEARNING VIA RULE ENSEMBLES , 2008, 0811.1679.

[65]  D. Rubin,et al.  Causal Inference for Statistics, Social, and Biomedical Sciences: Sensitivity Analysis and Bounds , 2015 .

[66]  Martin Wattenberg,et al.  SmoothGrad: removing noise by adding noise , 2017, ArXiv.

[67]  Alun D. Preece,et al.  Interpretability of deep learning models: A survey of results , 2017, 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI).

[68]  Motoaki Kawanabe,et al.  How to Explain Individual Classification Decisions , 2009, J. Mach. Learn. Res..

[69]  et al.,et al.  Jupyter Notebooks - a publishing format for reproducible computational workflows , 2016, ELPUB.

[70]  R. Tibshirani,et al.  Generalized Additive Models , 1991 .

[71]  Scott M. Lundberg,et al.  Consistent Individualized Feature Attribution for Tree Ensembles , 2018, ArXiv.

[72]  Max Welling,et al.  Visualizing Deep Neural Network Decisions: Prediction Difference Analysis , 2017, ICLR.

[73]  Lalana Kagal,et al.  J un 2 01 8 Explaining Explanations : An Approach to Evaluating Interpretability of Machine Learning , 2018 .

[74]  Fei-Fei Li,et al.  Visualizing and Understanding Recurrent Networks , 2015, ArXiv.

[75]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[76]  D. Boyd,et al.  CRITICAL QUESTIONS FOR BIG DATA , 2012 .

[77]  Bram van Ginneken,et al.  A survey on deep learning in medical image analysis , 2017, Medical Image Anal..

[78]  Bin Yu,et al.  Beyond Word Importance: Contextual Decomposition to Extract Interactions from LSTMs , 2018, ICLR.

[79]  F. Keil,et al.  Explanation and understanding , 2015 .

[80]  John F. Canny,et al.  Interpretable Learning for Self-Driving Cars by Visualizing Causal Attention , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[81]  Wes McKinney,et al.  Data Structures for Statistical Computing in Python , 2010, SciPy.

[82]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[83]  Cengiz Öztireli,et al.  Towards better understanding of gradient-based attribution methods for Deep Neural Networks , 2017, ICLR.

[84]  Cynthia Rudin,et al.  Please Stop Explaining Black Box Models for High Stakes Decisions , 2018, ArXiv.

[85]  William L. Oliver,et al.  The Emergence of Machine Learning Techniques in Criminology , 2013 .

[86]  Achim Zeileis,et al.  BMC Bioinformatics BioMed Central Methodology article Conditional variable importance for random forests , 2008 .

[87]  Yair Zick,et al.  Algorithmic Transparency via Quantitative Input Influence: Theory and Experiments with Learning Systems , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[88]  Siqi Wu,et al.  Stability-driven nonnegative matrix factorization to interpret spatial gene expression and build local gene networks , 2016, Proceedings of the National Academy of Sciences.