Review of white box methods for explanations of convolutional neural networks in image classification tasks

Abstract. In recent years, deep learning has become prevalent to solve applications from multiple domains. Convolutional neural networks (CNNs) particularly have demonstrated state-of-the-art performance for the task of image classification. However, the decisions made by these networks are not transparent and cannot be directly interpreted by a human. Several approaches have been proposed to explain the reasoning behind a prediction made by a network. We propose a topology of grouping these methods based on their assumptions and implementations. We focus primarily on white box methods that leverage the information of the internal architecture of a network to explain its decision. Given the task of image classification and a trained CNN, our work aims to provide a comprehensive and detailed overview of a set of methods that can be used to create explanation maps for a particular image, which assign an importance score to each pixel of the image based on its contribution to the decision of the network. We also propose a further classification of the white box methods based on their implementations to enable better comparisons and help researchers find methods best suited for different scenarios.

[1]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[2]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[3]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[4]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[5]  Brian McWilliams,et al.  The Shattered Gradients Problem: If resnets are the answer, then what is the question? , 2017, ICML.

[6]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[7]  Frédo Durand,et al.  What Do Different Evaluation Metrics Tell Us About Saliency Models? , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Andrea Vedaldi,et al.  Explanations for Attributing Deep Neural Network Predictions , 2019, Explainable AI.

[9]  Jenny Benois-Pineau,et al.  Saliency-based selection of visual content for deep convolutional neural networks , 2018, Multimedia Tools and Applications.

[10]  Wojciech Samek,et al.  Explainable AI: Interpreting, Explaining and Visualizing Deep Learning , 2019, Explainable AI.

[11]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Franco Turini,et al.  A Survey of Methods for Explaining Black Box Models , 2018, ACM Comput. Surv..

[13]  Alexander Binder,et al.  Unmasking Clever Hans predictors and assessing what machines really learn , 2019, Nature Communications.

[14]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[15]  R. Deriche,et al.  Explainable 3D-CNN for Multiple Sclerosis Patients Stratification , 2020, ICPR Workshops.

[16]  Jianjun Wang,et al.  Denoising convolutional neural network inspired via multi-layer convolutional sparse coding , 2021, J. Electronic Imaging.

[17]  Jinxia Shang,et al.  Moving object properties-based video saliency detection , 2021, J. Electronic Imaging.

[18]  Thomas B. Moeslund,et al.  Expert Level Evaluations for Explainable AI (XAI) Methods in the Medical Domain , 2020, ICPR Workshops.

[19]  Joao Marques-Silva,et al.  On Relating Explanations and Adversarial Examples , 2019, NeurIPS.

[20]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[21]  Alexander Binder,et al.  Evaluating the Visualization of What a Deep Neural Network Has Learned , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[22]  Xiangyu Zhang,et al.  Attacks Meet Interpretability: Attribute-steered Detection of Adversarial Samples , 2018, NeurIPS.

[23]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[24]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Seyed-Mohsen Moosavi-Dezfooli,et al.  DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[28]  Jenny Benois-Pineau,et al.  Comparative study of visual saliency maps in the problem of classification of architectural images with Deep CNNs , 2018, 2018 Eighth International Conference on Image Processing Theory, Tools and Applications (IPTA).

[29]  Jenny Benois-Pineau,et al.  Features Understanding in 3D CNNs for Actions Recognition in Video , 2020, 2020 Tenth International Conference on Image Processing Theory, Tools and Applications (IPTA).

[30]  Alexander Binder,et al.  Explaining nonlinear classification decisions with deep Taylor decomposition , 2015, Pattern Recognit..

[31]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[32]  Geoffrey E. Hinton,et al.  Distilling a Neural Network Into a Soft Decision Tree , 2017, CEx@AI*IA.

[33]  Krzysztof Z. Gajos,et al.  Proxy tasks and subjective measures can be misleading in evaluating explainable AI systems , 2020, IUI.

[34]  Jenny Benois-Pineau,et al.  Saliency Driven Object recognition in egocentric videos with deep CNN: toward application in assistance to Neuroprostheses , 2016, Comput. Vis. Image Underst..

[35]  Andrew Slavin Ross,et al.  Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing their Input Gradients , 2017, AAAI.

[36]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[37]  Jenny Benois-Pineau,et al.  Deep Learning in Mining of Visual Content , 2020, Springer Briefs in Computer Science.

[38]  Jenny Benois-Pineau,et al.  Improving Alzheimer's stage categorization with Convolutional Neural Network using transfer learning and different magnetic resonance imaging modalities , 2020, Heliyon.

[39]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[40]  Abraham Montoya Obeso,et al.  Image annotation for Mexican buildings database , 2016, Optical Engineering + Applications.

[41]  V. Moscato,et al.  Reliability of eXplainable Artificial Intelligence in Adversarial Perturbation Scenarios , 2020, ICPR Workshops.

[42]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[43]  Kouichi Sakurai,et al.  One Pixel Attack for Fooling Deep Neural Networks , 2017, IEEE Transactions on Evolutionary Computation.

[44]  Christophe Garcia,et al.  Convolutional face finder: a neural architecture for fast and robust face detection , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[46]  Been Kim,et al.  Sanity Checks for Saliency Maps , 2018, NeurIPS.

[47]  Martin Wattenberg,et al.  SmoothGrad: removing noise by adding noise , 2017, ArXiv.

[48]  Klaus-Robert Müller,et al.  Layer-Wise Relevance Propagation: An Overview , 2019, Explainable AI.

[49]  Olcay Boz,et al.  Extracting decision trees from trained neural networks , 2002, KDD.

[50]  Eric D. Ragan,et al.  Quantitative Evaluation of Machine Learning Explanations: A Human-Grounded Benchmark , 2021, IUI.

[51]  Roberto Ardon,et al.  Combining Similarity and Adversarial Learning to Generate Visual Explanation: Application to Medical Image Classification , 2020, ArXiv.

[52]  Nan Wu,et al.  Deep Neural Networks Improve Radiologists’ Performance in Breast Cancer Screening , 2019, IEEE Transactions on Medical Imaging.

[53]  Jenny Benois-Pineau,et al.  Fine grained sport action recognition with Twin spatio-temporal convolutional neural networks , 2020, Multimedia Tools and Applications.

[54]  Random Forest Model and Sample Explainer for Non-experts in Machine Learning - Two Case Studies , 2020, ICPR Workshops.

[55]  Jenny Benois-Pineau,et al.  Organizing Cultural Heritage with Deep Features , 2019, SUMAC @ ACM Multimedia.

[56]  Alexander Binder,et al.  On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.