Attribution in Scale and Space

We study the attribution problem for deep networks applied to perception tasks. For vision tasks, attribution techniques attribute the prediction of a network to the pixels of the input image. We propose a new technique called Blur Integrated Gradients (Blur IG). This technique has several advantages over other methods. First, it can tell at what scale a network recognizes an object. It produces scores in the scale/frequency dimension, that we find captures interesting phenomena. Second, it satisfies the scale-space axioms, which imply that it employs perturbations that are free of artifact. We therefore produce explanations that are cleaner and consistent with the operation of deep networks. Third, it eliminates the need for baseline parameter for Integrated Gradients for perception tasks. This is desirable because the choice of baseline has a significant effect on the explanations. We compare the proposed technique against previous techniques and demonstrate application on three tasks: ImageNet object recognition, Diabetic Retinopathy prediction, and AudioSet audio event identification. Code and examples are at https://github.com/PAIR-code/saliency.

[1]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[2]  J. Koenderink The structure of images , 2004, Biological Cybernetics.

[3]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[4]  Alexander Binder,et al.  Layer-Wise Relevance Propagation for Neural Networks with Local Renormalization Layers , 2016, ICANN.

[5]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[8]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[9]  Subhashini Venugopalan,et al.  Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs. , 2016, JAMA.

[10]  Tony Lindeberg,et al.  Scale-Space Theory in Computer Vision , 1993, Lecture Notes in Computer Science.

[11]  Andrea Vedaldi,et al.  Understanding Deep Networks via Extremal Perturbations and Smooth Masks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Eric J. Friedman,et al.  Paths and consistency in additive cost sharing , 2004, Int. J. Game Theory.

[13]  Tolga Bolukbasi,et al.  XRAI: Better Attributions Through Regions , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[15]  Alexander Mordvintsev,et al.  Inceptionism: Going Deeper into Neural Networks , 2015 .

[16]  Tony Lindeberg,et al.  Detecting salient blob-like image structures and their scales with a scale-space primal sketch: A method for focus-of-attention , 1993, International Journal of Computer Vision.

[17]  Ankur Taly,et al.  Exploring Principled Visualizations for Deep Network Attributions , 2019, IUI Workshops.

[18]  Aren Jansen,et al.  CNN architectures for large-scale audio classification , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[20]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[21]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[22]  Aren Jansen,et al.  Audio Set: An ontology and human-labeled dataset for audio events , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[24]  Andrew P. Witkin,et al.  Scale-Space Filtering , 1983, IJCAI.

[25]  Tony Lindeberg,et al.  Scale-Space for Discrete Signals , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Anders Friberg,et al.  Scale-Space Theory for Auditory Signals , 2015, SSVM.

[27]  L. Shapley,et al.  Values of Non-Atomic Games , 1974 .

[28]  Tony Lindeberg,et al.  A computational theory of visual receptive fields , 2013, Biological Cybernetics.

[29]  Daniel P. W. Ellis,et al.  General-purpose Tagging of Freesound Audio with AudioSet Labels: Task Description, Dataset, and Baseline , 2018, DCASE.

[30]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[32]  Motoaki Kawanabe,et al.  How to Explain Individual Classification Decisions , 2009, J. Mach. Learn. Res..

[33]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).