Breaking Batch Normalization for better explainability of Deep Neural Networks through Layer-wise Relevance Propagation

The lack of transparency of neural networks stays a major break for their use. The Layerwise Relevance Propagation technique builds heat-maps representing the relevance of each input in the model s decision. The relevance spreads backward from the last to the first layer of the Deep Neural Network. Layer-wise Relevance Propagation does not manage normalization layers, in this work we suggest a method to include normalization layers. Specifically, we build an equivalent network fusing normalization layers and convolutional or fully connected layers. Heatmaps obtained with our method on MNIST and CIFAR 10 datasets are more accurate for convolutional layers. Our study also prevents from using Layerwise Relevance Propagation with networks including a combination of connected layers and normalization layer.

[1]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[2]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Klaus-Robert Müller,et al.  Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models , 2017, ArXiv.

[4]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[5]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[7]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[8]  Moustapha Cissé,et al.  ConvNets and ImageNet Beyond Accuracy: Understanding Mistakes and Uncovering Biases , 2017, ECCV.

[9]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Alexander Binder,et al.  On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.

[11]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[12]  Alexandre Tkatchenko,et al.  Quantum-chemical insights from deep tensor neural networks , 2016, Nature Communications.

[13]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[15]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[16]  Zachary Chase Lipton The mythos of model interpretability , 2016, ACM Queue.

[17]  Lawrence D. Jackel,et al.  Explaining How a Deep Neural Network Trained with End-to-End Learning Steers a Car , 2017, ArXiv.

[18]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[19]  Alexander Binder,et al.  Explaining nonlinear classification decisions with deep Taylor decomposition , 2015, Pattern Recognit..

[20]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[21]  Masaru Ishii,et al.  Towards computational fluorescence microscopy: Machine learning-based integrated prediction of morphological and molecular tumor profiles , 2018, ArXiv.

[22]  W. Pitts,et al.  What the Frog's Eye Tells the Frog's Brain , 1959, Proceedings of the IRE.

[23]  Wojciech Samek,et al.  Methods for interpreting and understanding deep neural networks , 2017, Digit. Signal Process..

[24]  Yoshua Bengio,et al.  Object Recognition with Gradient-Based Learning , 1999, Shape, Contour and Grouping in Computer Vision.

[25]  Ramprasaath R. Selvaraju,et al.  Grad-CAM: Why did you say that? Visual Explanations from Deep Networks via Gradient-based Localization , 2016 .

[26]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.