Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)

The interpretation of deep learning models is a challenge due to their size, complexity, and often opaque internal state. In addition, many systems, such as image classifiers, operate on low-level features rather than high-level concepts. To address these challenges, we introduce Concept Activation Vectors (CAVs), which provide an interpretation of a neural net's internal state in terms of human-friendly concepts. The key idea is to view the high-dimensional internal state of a neural net as an aid, not an obstacle. We show how to use CAVs as part of a technique, Testing with CAVs (TCAV), that uses directional derivatives to quantify the degree to which a user-defined concept is important to a classification result--for example, how sensitive a prediction of "zebra" is to the presence of stripes. Using the domain of image classification as a testing ground, we describe how CAVs may be used to explore hypotheses and generate insights for a standard image classification network as well as a medical application.

[1]  Abhishek Das,et al.  Grad-CAM: Why did you say that? , 2016, ArXiv.

[2]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[3]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[4]  Yoshua Bengio,et al.  Understanding intermediate layers using linear classifier probes , 2016, ICLR.

[5]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[6]  Jonathan Krause,et al.  Grader variability and the importance of reference standards for evaluating machine learning models for diabetic retinopathy , 2017, Ophthalmology.

[7]  Finale Doshi-Velez,et al.  Mind the Gap: A Generative Approach to Interpretable Feature Selection and Extraction , 2015, NIPS.

[8]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[9]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[10]  A. Cerasa,et al.  Machine learning on brain MRI data for differential diagnosis of Parkinson's disease and Progressive Supranuclear Palsy , 2014, Journal of Neuroscience Methods.

[11]  Moustapha Cissé,et al.  ConvNets and ImageNet Beyond Accuracy: Explanations, Bias Detection, Adversarial Examples and Model Criticism , 2017, ArXiv.

[12]  Been Kim,et al.  Local Explanation Methods for Deep Neural Networks Lack Sensitivity to Parameter Values , 2018, ICLR.

[13]  Abubakar Abid,et al.  Interpretation of Neural Networks is Fragile , 2017, AAAI.

[14]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Martin Wattenberg,et al.  SmoothGrad: removing noise by adding noise , 2017, ArXiv.

[16]  Jascha Sohl-Dickstein,et al.  SVCCA: Singular Vector Canonical Correlation Analysis for Deep Understanding and Improvement , 2017, ArXiv.

[17]  Adam Roberts,et al.  Latent Constraints: Learning to Generate Conditionally from Unconditional Generative Models , 2017, ICLR.

[18]  Ryan P. Adams,et al.  Graph-Sparse LDA: A Topic Model with Structured Sparsity , 2014, AAAI.

[19]  Pascal Vincent,et al.  Visualizing Higher-Layer Features of a Deep Network , 2009 .

[20]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[21]  Seth Flaxman,et al.  European Union Regulations on Algorithmic Decision-Making and a "Right to Explanation" , 2016, AI Mag..

[22]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[23]  Johannes Gehrke,et al.  Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission , 2015, KDD.

[24]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[26]  Bolei Zhou,et al.  Network Dissection: Quantifying Interpretability of Deep Visual Representations , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[28]  Yarin Gal,et al.  Real Time Image Saliency for Black Box Classifiers , 2017, NIPS.

[29]  Cynthia Rudin,et al.  Supersparse Linear Integer Models for Interpretable Classification , 2013, 1306.6677.

[30]  Dumitru Erhan,et al.  The (Un)reliability of saliency methods , 2017, Explainable AI.

[31]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[32]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[33]  Deborah Silver,et al.  Feature Visualization , 1994, Scientific Visualization.

[34]  Cynthia Rudin,et al.  The Bayesian Case Model: A Generative Approach for Case-Based Reasoning and Prototype Classification , 2014, NIPS.

[35]  Alexander Mordvintsev,et al.  Inceptionism: Going Deeper into Neural Networks , 2015 .

[36]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[37]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.