Interpretable Explanations of Black Boxes by Meaningful Perturbation

As machine learning algorithms are increasingly applied to high impact yet high risk tasks, such as medical diagnosis or autonomous driving, it is critical that researchers can explain how such algorithms arrived at their predictions. In recent years, a number of image saliency methods have been developed to summarize where highly complex neural networks “look” in an image for evidence for their predictions. However, these techniques are limited by their heuristic nature and architectural constraints. In this paper, we make two main contributions: First, we propose a general framework for learning different kinds of explanations for any black box algorithm. Second, we specialise the framework to find the part of an image most responsible for a classifier decision. Unlike previous works, our method is model-agnostic and testable because it is grounded in explicit and interpretable image perturbations.

[1]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[2]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[3]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[4]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[5]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[6]  Alexander Binder,et al.  On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.

[7]  Wei Xu,et al.  Look and Think Twice: Capturing Top-Down Visual Attention with Feedback Convolutional Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[8]  Jason Yosinski,et al.  Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Andrea Vedaldi,et al.  Understanding deep image representations by inverting them , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Bolei Zhou,et al.  Object Detectors Emerge in Deep Scene CNNs , 2014, ICLR.

[11]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[14]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[15]  Zhe L. Lin,et al.  Top-Down Neural Attention by Excitation Backprop , 2016, ECCV.

[16]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[18]  Ryan Turner,et al.  A model explanation system , 2016, 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP).

[19]  Andrea Vedaldi,et al.  Salient Deconvolutional Networks , 2016, ECCV.

[20]  Ramprasaath R. Selvaraju,et al.  Grad-CAM: Why did you say that? Visual Explanations from Deep Networks via Gradient-based Localization , 2016 .

[21]  Andrea Vedaldi,et al.  Visualizing Deep Convolutional Neural Networks Using Natural Pre-images , 2015, International Journal of Computer Vision.

[22]  Andrea Vedaldi,et al.  I Have Seen Enough: Transferring Parts Across Categories , 2016, BMVC.

[23]  Zhe L. Lin,et al.  Top-Down Neural Attention by Excitation Backprop , 2016, International Journal of Computer Vision.

[24]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[25]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).