Robust Explainability: A tutorial on gradient-based attribution methods for deep neural networks

With the rise of deep neural networks, the challenge of explaining the predictions of these networks has become increasingly recognized. While many methods for explaining the decisions of deep neural networks exist, there is currently no consensus on how to evaluate them. On the other hand, robustness is a popular topic for deep learning research; however, it is hardly talked about in explainability until very recently. In this tutorial paper, we start by presenting gradient-based interpretability methods. These techniques use gradient signals to assign the burden of the decision on the input features. Later, we discuss how gradient-based methods can be evaluated for their robustness and the role that adversarial robustness plays in having meaningful explanations. We also discuss the limitations of gradient-based methods. Finally, we present the best practices and attributes that should be examined before choosing an explainability method. We conclude with the future directions for research in the area at the convergence of robustness and explainability.

[1]  Markus H. Gross,et al.  Gradient-Based Attribution Methods , 2019, Explainable AI.

[2]  Pascal Sturmfels,et al.  Visualizing the Impact of Feature Attribution Baselines , 2020 .

[3]  Francisco Herrera,et al.  Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI , 2020, Inf. Fusion.

[4]  Hyeonseok Lee,et al.  Building Reliable Explanations of Unreliable Neural Networks: Locally Smoothing Perspective of Model Interpretation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Dumitru Erhan,et al.  A Benchmark for Interpretability Methods in Deep Neural Networks , 2018, NeurIPS.

[6]  Ghassan AlRegib,et al.  Contrastive Explanations In Neural Networks , 2020, 2020 IEEE International Conference on Image Processing (ICIP).

[7]  Yang Zhang,et al.  A Theoretical Explanation for Perplexing Behaviors of Backpropagation-based Visualizations , 2018, ICML.

[8]  Joao Marques-Silva,et al.  On Relating Explanations and Adversarial Examples , 2019, NeurIPS.

[9]  Dumitru Erhan,et al.  The (Un)reliability of saliency methods , 2017, Explainable AI.

[10]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[11]  Andrew Slavin Ross,et al.  Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing their Input Gradients , 2017, AAAI.

[12]  Prudhvi Gurram,et al.  Sanity Checks for Saliency Metrics , 2019, AAAI.

[13]  Carola-Bibiane Schönlieb,et al.  On the Connection Between Adversarial Robustness and Saliency Map Interpretability , 2019, ICML.

[14]  Alexander Binder,et al.  Explaining nonlinear classification decisions with deep Taylor decomposition , 2015, Pattern Recognit..

[15]  Tommi S. Jaakkola,et al.  On the Robustness of Interpretability Methods , 2018, ArXiv.

[16]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[17]  Seth Flaxman,et al.  European Union Regulations on Algorithmic Decision-Making and a "Right to Explanation" , 2016, AI Mag..

[18]  Ghassan Al-Regib,et al.  Distorted Representation Space Characterization Through Backpropagated Gradients , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[19]  Cynthia Rudin,et al.  Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead , 2018, Nature Machine Intelligence.

[20]  Derek Doran,et al.  What Does Explainable AI Really Mean? A New Conceptualization of Perspectives , 2017, CEx@AI*IA.

[21]  Anh Nguyen,et al.  SAM: The Sensitivity of Attribution Methods to Hyperparameters , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[23]  Jacob Andreas,et al.  Are Visual Explanations Useful? A Case Study in Model-in-the-Loop Prediction , 2020, ArXiv.

[24]  Been Kim,et al.  Sanity Checks for Saliency Maps , 2018, NeurIPS.

[25]  Martin Wattenberg,et al.  SmoothGrad: removing noise by adding noise , 2017, ArXiv.

[26]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[27]  Alexander Binder,et al.  Evaluating the Visualization of What a Deep Neural Network Has Learned , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[28]  Francois Fleuret,et al.  Full-Gradient Representation for Neural Network Visualization , 2019, NeurIPS.

[29]  Ivan Donadello,et al.  EXplainable Neural-Symbolic Learning (X-NeSyL) methodology to fuse deep learning representations with expert knowledge graphs: the MonuMAI cultural heritage use case , 2021, Inf. Fusion.

[30]  Aleksander Madry,et al.  Robustness May Be at Odds with Accuracy , 2018, ICLR.

[31]  Abubakar Abid,et al.  Interpretation of Neural Networks is Fragile , 2017, AAAI.

[32]  Mohit Prabhushankar,et al.  Contrastive Reasoning in Neural Networks , 2021, ArXiv.

[33]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[34]  Klaus-Robert Müller,et al.  Explanations can be manipulated and geometry is to blame , 2019, NeurIPS.

[35]  Chih-Kuan Yeh,et al.  On the (In)fidelity and Sensitivity for Explanations. , 2019, 1901.09392.

[36]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[37]  Beomsu Kim,et al.  Bridging Adversarial Robustness and Gradient Interpretability , 2019, ArXiv.

[38]  Alexander Binder,et al.  On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.

[39]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.