When Explanations Lie: Why Modified BP Attribution Fails

Modified backpropagation methods are a popular group of attribution methods. We analyse the most prominent methods: Deep Taylor Decomposition, Layer-wise Relevance Propagation, Excitation BP, PatternAttribution, Deconv, and Guided BP. We found empirically that the explanations of the mentioned modified BP methods are independent of the parameters of later layers and show that the z rule used by multiple methods converges to a rank-1 matrix. This can explain well why the actual network’s decision is ignored. We also develop a new metric cosine similarity convergence (CSC) to directly quantify the convergence of the modified BP methods to a rank-1 matrix. Our conclusion is that many modified BP methods do not explain the predictions of deep neural networks faithfully.

[1]  J. Hajnal,et al.  On products of non-negative matrices , 1976, Mathematical Proceedings of the Cambridge Philosophical Society.

[2]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[3]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[4]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[5]  Alexander Binder,et al.  On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.

[6]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[7]  Zhe L. Lin,et al.  Top-Down Neural Attention by Excitation Backprop , 2016, ECCV.

[8]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Alexander Binder,et al.  Understanding and Comparing Deep Neural Networks for Age and Gender Classification , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[11]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[12]  Ramprasaath R. Selvaraju,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[13]  Alexander Binder,et al.  Evaluating the Visualization of What a Deep Neural Network Has Learned , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[14]  Alexander Binder,et al.  Explaining nonlinear classification decisions with deep Taylor decomposition , 2015, Pattern Recognit..

[15]  Markus H. Gross,et al.  A unified view of gradient-based attribution methods for Deep Neural Networks , 2017, NIPS 2017.

[16]  Brian McWilliams,et al.  The Shattered Gradients Problem: If resnets are the answer, then what is the question? , 2017, ICML.

[17]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[18]  Martin Wattenberg,et al.  SmoothGrad: removing noise by adding noise , 2017, ArXiv.

[19]  Klaus-Robert Müller,et al.  Learning how to explain neural networks: PatternNet and PatternAttribution , 2017, ICLR.

[20]  Yang Zhang,et al.  A Theoretical Explanation for Perplexing Behaviors of Backpropagation-based Visualizations , 2018, ICML.

[21]  Martin Wattenberg,et al.  Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) , 2017, ICML.

[22]  Been Kim,et al.  Sanity Checks for Saliency Maps , 2018, NeurIPS.

[23]  Volker Tresp,et al.  Explaining Therapy Predictions with Layer-Wise Relevance Propagation in Neural Networks , 2018, 2018 IEEE International Conference on Healthcare Informatics (ICHI).

[24]  Florian Lingenfelser,et al.  Relevance-Based Feature Masking: Improving Neural Network Based Whale Classification Through Explainable Artificial Intelligence , 2019, INTERSPEECH.

[25]  Alexander Binder,et al.  Unmasking Clever Hans predictors and assessing what machines really learn , 2019, Nature Communications.

[26]  Martin Weygandt,et al.  Layer-Wise Relevance Propagation for Explaining Deep Neural Network Decisions in MRI-Based Alzheimer's Disease Classification , 2019, Front. Aging Neurosci..

[27]  Klaus-Robert Müller,et al.  iNNvestigate neural networks! , 2018, J. Mach. Learn. Res..

[28]  Shinichi Nakajima,et al.  Towards Best Practice in Explaining Neural Network Decisions with LRP , 2019, 2020 International Joint Conference on Neural Networks (IJCNN).

[29]  Federico Tombari,et al.  Restricting the Flow: Information Bottlenecks for Attribution , 2020, ICLR.