Explanations for Attributing Deep Neural Network Predictions

Given the recent success of deep neural networks and their applications to more high impact and high risk applications, like autonomous driving and healthcare decision-making, there is a great need for faithful and interpretable explanations of “why” an algorithm is making a certain prediction. In this chapter, we introduce 1. Meta-Predictors as Explanations, a principled framework for learning explanations for any black box algorithm, and 2. Meaningful Perturbations, an instantiation of our paradigm applied to the problem of attribution, which is concerned with attributing what features of an input (i.e., regions of an input image) are responsible for a model’s output (i.e., a CNN classifier’s object class prediction). We first introduced these contributions in [8]. We also briefly survey existing visual attribution methods and highlight how they faith to be both faithful and interpretable.

[1]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[2]  Klaus-Robert Müller,et al.  "What is relevant in a text document?": An interpretable machine learning approach , 2016, PloS one.

[3]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[4]  Bolei Zhou,et al.  Object Detectors Emerge in Deep Scene CNNs , 2014, ICLR.

[5]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[6]  Andrea Vedaldi,et al.  Understanding Image Representations by Measuring Their Equivariance and Equivalence , 2014, International Journal of Computer Vision.

[7]  Andrea Vedaldi,et al.  Interpretable Explanations of Black Boxes by Meaningful Perturbation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[8]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[9]  Zhe L. Lin,et al.  Top-Down Neural Attention by Excitation Backprop , 2016, International Journal of Computer Vision.

[10]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[11]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[12]  Andrea Vedaldi,et al.  Understanding deep image representations by inverting them , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[14]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[15]  Kate Saenko,et al.  RISE: Randomized Input Sampling for Explanation of Black-box Models , 2018, BMVC.

[16]  Andrea Vedaldi,et al.  Net2Vec: Quantifying and Explaining How Concepts are Encoded by Filters in Deep Neural Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[18]  Wojciech Samek,et al.  Methods for interpreting and understanding deep neural networks , 2017, Digit. Signal Process..

[19]  Cengiz Öztireli,et al.  Towards better understanding of gradient-based attribution methods for Deep Neural Networks , 2017, ICLR.

[20]  Ryan Turner,et al.  A model explanation system , 2016, 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP).

[21]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[22]  Alexander Binder,et al.  Evaluating the Visualization of What a Deep Neural Network Has Learned , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[23]  Markus H. Gross,et al.  Gradient-Based Attribution Methods , 2019, Explainable AI.

[24]  Alexander Binder,et al.  On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.

[25]  Jonathan Dodge,et al.  Visualizing and Understanding Atari Agents , 2017, ICML.

[26]  Been Kim,et al.  Sanity Checks for Saliency Maps , 2018, NeurIPS.

[27]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Jason Yosinski,et al.  Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Martin Wattenberg,et al.  SmoothGrad: removing noise by adding noise , 2017, ArXiv.

[30]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Wei Xu,et al.  Look and Think Twice: Capturing Top-Down Visual Attention with Feedback Convolutional Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[32]  Yarin Gal,et al.  Real Time Image Saliency for Black Box Classifiers , 2017, NIPS.

[33]  Andrea Vedaldi,et al.  Salient Deconvolutional Networks , 2016, ECCV.

[34]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.