论文信息 - Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)

Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)

The interpretation of deep learning models is a challenge due to their size, complexity, and often opaque internal state. In addition, many systems, such as image classifiers, operate on low-level features rather than high-level concepts. To address these challenges, we introduce Concept Activation Vectors (CAVs), which provide an interpretation of a neural net's internal state in terms of human-friendly concepts. The key idea is to view the high-dimensional internal state of a neural net as an aid, not an obstacle. We show how to use CAVs as part of a technique, Testing with CAVs (TCAV), that uses directional derivatives to quantify the degree to which a user-defined concept is important to a classification result--for example, how sensitive a prediction of "zebra" is to the presence of stripes. Using the domain of image classification as a testing ground, we describe how CAVs may be used to explore hypotheses and generate insights for a standard image classification network as well as a medical application.

[1] Abhishek Das,et al. Grad-CAM: Why did you say that? , 2016, ArXiv.

[2] Carlos Guestrin,et al. "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[3] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[4] Yoshua Bengio,et al. Understanding intermediate layers using linear classifier probes , 2016, ICLR.

[5] Scott Lundberg,et al. A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[6] Jonathan Krause,et al. Grader variability and the importance of reference standards for evaluating machine learning models for diabetic retinopathy , 2017, Ophthalmology.

[7] Finale Doshi-Velez,et al. Mind the Gap: A Generative Approach to Interpretable Feature Selection and Extraction , 2015, NIPS.

[8] R. Tibshirani,et al. Sparse Principal Component Analysis , 2006 .

[9] Ankur Taly,et al. Axiomatic Attribution for Deep Networks , 2017, ICML.

[10] A. Cerasa,et al. Machine learning on brain MRI data for differential diagnosis of Parkinson's disease and Progressive Supranuclear Palsy , 2014, Journal of Neuroscience Methods.

[11] Moustapha Cissé,et al. ConvNets and ImageNet Beyond Accuracy: Explanations, Bias Detection, Adversarial Examples and Model Criticism , 2017, ArXiv.

[12] Been Kim,et al. Local Explanation Methods for Deep Neural Networks Lack Sensitivity to Parameter Values , 2018, ICLR.

[13] Abubakar Abid,et al. Interpretation of Neural Networks is Fragile , 2017, AAAI.

[14] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Martin Wattenberg,et al. SmoothGrad: removing noise by adding noise , 2017, ArXiv.

[16] Jascha Sohl-Dickstein,et al. SVCCA: Singular Vector Canonical Correlation Analysis for Deep Understanding and Improvement , 2017, ArXiv.

[17] Adam Roberts,et al. Latent Constraints: Learning to Generate Conditionally from Unconditional Generative Models , 2017, ICLR.

[18] Ryan P. Adams,et al. Graph-Sparse LDA: A Topic Model with Structured Sparsity , 2014, AAAI.

[19] Pascal Vincent,et al. Visualizing Higher-Layer Features of a Deep Network , 2009 .

[20] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[21] Seth Flaxman,et al. European Union Regulations on Algorithmic Decision-Making and a "Right to Explanation" , 2016, AI Mag..

[22] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[23] Johannes Gehrke,et al. Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission , 2015, KDD.

[24] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Been Kim,et al. Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[26] Bolei Zhou,et al. Network Dissection: Quantifying Interpretability of Deep Visual Representations , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .

[28] Yarin Gal,et al. Real Time Image Saliency for Black Box Classifiers , 2017, NIPS.

[29] Cynthia Rudin,et al. Supersparse Linear Integer Models for Interpretable Classification , 2013, 1306.6677.

[30] Dumitru Erhan,et al. The (Un)reliability of saliency methods , 2017, Explainable AI.

[31] Marwan Mattar,et al. Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[32] Percy Liang,et al. Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[33] Deborah Silver,et al. Feature Visualization , 1994, Scientific Visualization.

[34] Cynthia Rudin,et al. The Bayesian Case Model: A Generative Approach for Case-Based Reasoning and Prototype Classification , 2014, NIPS.

[35] Alexander Mordvintsev,et al. Inceptionism: Going Deeper into Neural Networks , 2015 .

[36] 拓海杉山,et al. “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[37] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.