论文信息 - A Benchmark for Interpretability Methods in Deep Neural Networks

A Benchmark for Interpretability Methods in Deep Neural Networks

We propose an empirical measure of the approximate accuracy of feature importance estimates in deep neural networks. Our results across several large-scale image classification datasets show that many popular interpretability methods produce estimates of feature importance that are not better than a random designation of feature importance. Only certain ensemble based approaches---VarGrad and SmoothGrad-Squared---outperform such a random assignment of importance. The manner of ensembling remains critical, we show that some approaches do no better then the underlying method but carry a far higher computational burden.

D. Erhan | Pieter-Jan Kindermans | Sara Hooker | Been Kim

[1] Quoc V. Le,et al. Do Better ImageNet Models Transfer Better? , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Dumitru Erhan,et al. The (Un)reliability of saliency methods , 2017, Explainable AI.

[3] Been Kim,et al. Sanity Checks for Saliency Maps , 2018, NeurIPS.

[4] Kate Saenko,et al. RISE: Randomized Input Sampling for Explanation of Black-box Models , 2018, BMVC.

[5] Bolei Zhou,et al. Revisiting the Importance of Individual Units in CNNs via Ablation , 2018, ArXiv.

[6] Samuel J. Gershman,et al. Human-in-the-Loop Interpretability Prior , 2018, NeurIPS.

[7] Matthew Botvinick,et al. On the importance of single directions for generalization , 2018, ICLR.

[8] Been Kim,et al. Local Explanation Methods for Deep Neural Networks Lack Sensitivity to Parameter Values , 2018, ICLR.

[9] Martin Wattenberg,et al. Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) , 2017, ICML.

[10] Andrew Slavin Ross,et al. Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing their Input Gradients , 2017, AAAI.

[11] Mike Wu,et al. Beyond Sparsity: Tree Regularization of Deep Models for Interpretability , 2017, AAAI.

[12] Klaus-Robert Müller,et al. Learning how to explain neural networks: PatternNet and PatternAttribution , 2017, ICLR.

[13] Geoffrey E. Hinton,et al. Distilling a Neural Network Into a Soft Decision Tree , 2017, CEx@AI*IA.

[14] Jascha Sohl-Dickstein,et al. SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability , 2017, NIPS.

[15] Martin Wattenberg,et al. SmoothGrad: removing noise by adding noise , 2017, ArXiv.

[16] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.

[17] Yarin Gal,et al. Real Time Image Saliency for Black Box Classifiers , 2017, NIPS.

[18] Andrea Vedaldi,et al. Interpretable Explanations of Black Boxes by Meaningful Perturbation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[19] Avanti Shrikumar,et al. Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[20] Andrew Slavin Ross,et al. Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations , 2017, IJCAI.

[21] Ankur Taly,et al. Axiomatic Attribution for Deep Networks , 2017, ICML.

[22] Max Welling,et al. Visualizing Deep Neural Network Decisions: Prediction Difference Analysis , 2017, ICLR.

[23] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.

[24] Abhishek Das,et al. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[25] Alexander Binder,et al. Explaining nonlinear classification decisions with deep Taylor decomposition , 2015, Pattern Recognit..

[26] Alexander Binder,et al. Evaluating the Visualization of What a Deep Neural Network Has Learned , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[27] Anna Shcherbina,et al. Not Just a Black Box: Learning Important Features Through Propagating Activation Differences , 2016, ArXiv.

[28] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29] Alexander Binder,et al. On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.

[30] Bolei Zhou,et al. Object Detectors Emerge in Deep Scene CNNs , 2014, ICLR.

[31] Thomas Brox,et al. Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[32] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[33] Matthieu Guillaumin,et al. Food-101 - Mining Discriminative Components with Random Forests , 2014, ECCV.

[34] Seung Woo Lee,et al. Birdsnap: Large-Scale Fine-Grained Visual Categorization of Birds , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[35] Rich Caruana,et al. Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[36] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[37] Aixia Guo,et al. Gene Selection for Cancer Classification using Support Vector Machines , 2014 .

[38] Motoaki Kawanabe,et al. How to Explain Individual Classification Decisions , 2009, J. Mach. Learn. Res..

[39] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[40] Deborah Silver,et al. Feature Visualization , 1994, Scientific Visualization.

[41] Irwin Sobel,et al. An Isotropic 3×3 image gradient operator , 1990 .