论文信息 - Definitions, methods, and applications in interpretable machine learning

Definitions, methods, and applications in interpretable machine learning

Significance The recent surge in interpretability research has led to confusion on numerous fronts. In particular, it is unclear what it means to be interpretable and how to select, evaluate, or even discuss methods for producing interpretations of machine-learning models. We aim to clarify these concerns by defining interpretable machine learning and constructing a unifying framework for existing methods which highlights the underappreciated role played by human audiences. Within this framework, methods are organized into 2 classes: model based and post hoc. To provide guidance in selecting and evaluating interpretation methods, we introduce 3 desiderata: predictive accuracy, descriptive accuracy, and relevancy. Using our framework, we review existing work, grounded in real-world studies which exemplify our desiderata, and suggest directions for future work. Machine-learning models have demonstrated great success in learning complex patterns that enable them to make predictions about unobserved data. In addition to using models for prediction, the ability to interpret what a model has learned is receiving an increasing amount of attention. However, this increased focus has led to considerable confusion about the notion of interpretability. In particular, it is unclear how the wide array of proposed interpretation methods are related and what common concepts can be used to evaluate them. We aim to address these concerns by defining interpretability in the context of machine learning and introducing the predictive, descriptive, relevant (PDR) framework for discussing interpretations. The PDR framework provides 3 overarching desiderata for evaluation: predictive accuracy, descriptive accuracy, and relevancy, with relevancy judged relative to a human audience. Moreover, to help manage the deluge of interpretation methods, we introduce a categorization of existing techniques into model-based and post hoc categories, with subgroups including sparsity, modularity, and simulatability. To demonstrate how practitioners can use the PDR framework to evaluate and understand interpretations, we provide numerous real-world examples. These examples highlight the often underappreciated role played by human audiences in discussions of interpretability. Finally, based on our framework, we discuss limitations of existing methods and directions for future work. We hope that this work will provide a common vocabulary that will make it easier for both practitioners and researchers to discuss and choose from the full range of interpretation methods.

[1] Guillermo Sapiro,et al. A Shared Vision for Machine Learning in Neuroscience , 2018, The Journal of Neuroscience.

[2] Jude W. Shavlik,et al. in Advances in Neural Information Processing , 1996 .

[3] Abhishek Das,et al. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[4] Andrew Slavin Ross,et al. Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations , 2017, IJCAI.

[5] I. Jolliffe. Principal Component Analysis , 2002 .

[6] Russell G. Death,et al. An accurate comparison of methods for quantifying variable importance in artificial neural networks using simulated data , 2004 .

[7] Percy Liang,et al. Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[8] Juan Enrique Ramos,et al. Using TF-IDF to Determine Word Relevance in Document Queries , 2003 .

[9] Terrence J. Sejnowski,et al. An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[10] Bogdan E. Popescu,et al. PREDICTIVE LEARNING VIA RULE ENSEMBLES , 2008, 0811.1679.

[11] G. Hooker. Generalized Functional ANOVA Diagnostics for High-Dimensional Functions of Dependent Variables , 2007 .

[12] Finale Doshi-Velez,et al. A Roadmap for a Rigorous Science of Interpretability , 2017, ArXiv.

[13] Chandan Singh,et al. Hierarchical interpretations for neural network predictions , 2018, ICLR.

[14] Bolei Zhou,et al. Understanding Intra-Class Knowledge Inside CNN , 2015, ArXiv.

[15] Bin Yu,et al. Daytime Arctic Cloud Detection Based on Multi-Angle Satellite Data With Case Studies , 2008 .

[16] C. Jennison,et al. Robust Statistics: The Approach Based on Influence Functions , 1987 .

[17] Been Kim,et al. Sanity Checks for Saliency Maps , 2018, NeurIPS.

[18] Leo Breiman,et al. Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001 .

[19] Wei-Yin Loh,et al. Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[20] Hadley Wickham,et al. ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[21] Ankur Taly,et al. Axiomatic Attribution for Deep Networks , 2017, ICML.

[22] D. Boyd,et al. CRITICAL QUESTIONS FOR BIG DATA , 2012 .

[23] Alun D. Preece,et al. Interpretability of deep learning models: A survey of results , 2017, 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI).

[24] Seth Flaxman,et al. European Union Regulations on Algorithmic Decision-Making and a "Right to Explanation" , 2016, AI Mag..

[25] Quanshi Zhang,et al. Interpreting CNN knowledge via an Explanatory Graph , 2017, AAAI.

[26] D. Freedman. Statistical models and shoe leather , 1989 .

[27] Yarin Gal,et al. Real Time Image Saliency for Black Box Classifiers , 2017, NIPS.

[28] Haiyan Huang,et al. Biclustering by sparse canonical correlation analysis , 2018, Quantitative Biology.

[29] T. Lombrozo. The structure and function of explanations , 2006, Trends in Cognitive Sciences.

[30] David R. Anderson,et al. Understanding AIC and BIC in Model Selection , 2004 .

[31] Motoaki Kawanabe,et al. How to Explain Individual Classification Decisions , 2009, J. Mach. Learn. Res..

[32] Bin Yu,et al. Superheat: Supervised heatmaps for visualizing complex data , 2015 .

[33] et al.,et al. Jupyter Notebooks - a publishing format for reproducible computational workflows , 2016, ELPUB.

[34] Arthur Szlam,et al. Automatic Rule Extraction from Long Short Term Memory Networks , 2016, ICLR.

[35] Regina Barzilay,et al. Rationalizing Neural Predictions , 2016, EMNLP.

[36] Achim Zeileis,et al. BMC Bioinformatics BioMed Central Methodology article Conditional variable importance for random forests , 2008 .

[37] Yair Zick,et al. Algorithmic Transparency via Quantitative Input Influence: Theory and Experiments with Learning Systems , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[38] Trevor Darrell,et al. Grounding of Textual Phrases in Images by Reconstruction , 2015, ECCV.

[39] Siqi Wu,et al. Stability-driven nonnegative matrix factorization to interpret spatial gene expression and build local gene networks , 2016, Proceedings of the National Academy of Sciences.

[40] Carlos Guestrin,et al. "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[41] Thomas Lengauer,et al. Permutation importance: a corrected feature importance measure , 2010, Bioinform..

[42] James B. Brown,et al. Iterative random forests to discover predictive and stable high-order interactions , 2017, Proceedings of the National Academy of Sciences.

[43] H. Hotelling. Relations Between Two Sets of Variates , 1936 .

[44] Dan Klein,et al. Neural Module Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45] Nathan Srebro,et al. Equality of Opportunity in Supervised Learning , 2016, NIPS.

[46] Bram van Ginneken,et al. A survey on deep learning in medical image analysis , 2017, Medical Image Anal..

[47] H. Akaike. Factor analysis and AIC , 1987 .

[48] Bin Yu,et al. Beyond Word Importance: Contextual Decomposition to Extract Interactions from LSTMs , 2018, ICLR.

[49] John David N. Dionisio,et al. Case-based explanation of non-case-based learning methods , 1999, AMIA.

[50] Cengiz Öztireli,et al. Towards better understanding of gradient-based attribution methods for Deep Neural Networks , 2017, ICLR.

[51] F. Keil. Explanation and understanding. , 2006, Annual review of psychology.

[52] John F. Canny,et al. Interpretable Learning for Self-Driving Cars by Visualizing Causal Attention , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[53] Wes McKinney,et al. Data Structures for Statistical Computing in Python , 2010, SciPy.

[54] Scott M. Lundberg,et al. Consistent Individualized Feature Attribution for Tree Ensembles , 2018, ArXiv.

[55] Y. C. Pati,et al. Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition , 1993, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers.

[56] Max Welling,et al. Visualizing Deep Neural Network Decisions: Prediction Difference Analysis , 2017, ICLR.

[57] Lalana Kagal,et al. J un 2 01 8 Explaining Explanations : An Approach to Evaluating Interpretability of Machine Learning , 2018 .

[58] Fei-Fei Li,et al. Visualizing and Understanding Recurrent Networks , 2015, ArXiv.

[59] Mu-Chen Chen,et al. Credit scoring with a data mining approach based on support vector machines , 2007, Expert Syst. Appl..

[60] Alexander M. Rush,et al. Visual Analysis of Hidden State Dynamics in Recurrent Neural Networks , 2016, ArXiv.

[61] Anna Shcherbina,et al. Not Just a Black Box: Learning Important Features Through Propagating Activation Differences , 2016, ArXiv.

[62] Ramprasaath R. Selvaraju,et al. Grad-CAM: Why did you say that? Visual Explanations from Deep Networks via Gradient-based Localization , 2016 .

[63] Toniann Pitassi,et al. Fairness through awareness , 2011, ITCS '12.

[64] Geoffrey E. Hinton,et al. Distilling a Neural Network Into a Soft Decision Tree , 2017, CEx@AI*IA.

[65] D. Rubin,et al. Causal Inference for Statistics, Social, and Biomedical Sciences: A General Method for Estimating Sampling Variances for Standard Estimators for Average Causal Effects , 2015 .

[66] Cynthia Rudin,et al. Please Stop Explaining Black Box Models for High Stakes Decisions , 2018, ArXiv.

[67] William L. Oliver,et al. The Emergence of Machine Learning Techniques in Criminology , 2013 .

[68] Bevil R. Conway,et al. Toward a Unified Theory of Visual Area V4 , 2012, Neuron.

[69] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[70] Chandan Singh,et al. Disentangled Attribution Curves for Interpreting Random Forests and Boosted Trees , 2019, ArXiv.

[71] Zachary Chase Lipton. The mythos of model interpretability , 2016, ACM Queue.

[72] Michael Tsang,et al. Can I trust you more? Model-Agnostic Hierarchical Explanations , 2018, ArXiv.

[73] D. Ruppert. Robust Statistics: The Approach Based on Influence Functions , 1987 .

[74] Bin Yu,et al. Structural Compression of Convolutional Neural Networks Based on Greedy Filter Pruning , 2017, ArXiv.

[75] Franco Turini,et al. A Survey of Methods for Explaining Black Box Models , 2018, ACM Comput. Surv..

[76] G. Box. Science and Statistics , 1976 .

[77] Pedro M. Valero-Mora,et al. ggplot2: Elegant Graphics for Data Analysis , 2010 .

[78] Christine D. Piatko,et al. Using “Annotator Rationales” to Improve Machine Learning for Text Categorization , 2007, NAACL.

[79] R. Tibshirani,et al. Generalized Additive Models , 1986 .

[80] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .

[81] Cynthia Rudin,et al. Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model , 2015, ArXiv.