How Can I Explain This to You? An Empirical Study of Deep Neural Network Explanation Methods

Explaining the inner workings of deep neural network models have received considerable attention in recent years. Researchers have attempted to provide human parseable explanations justifying why a model performed a specific classification. Although many of these toolkits are available for use, it is unclear which style of explanation is preferred by end-users, thereby demanding investigation. We performed a cross-analysis Amazon Mechanical Turk study comparing the popular state-of-the-art explanation methods to empirically determine which are better in explaining model decisions. The participants were asked to compare explanation methods across applications spanning image, text, audio, and sensory domains. Among the surveyed methods, explanation-by-example was preferred in all domains except text sentiment classification, where LIME’s method of annotating input text was preferred. We highlight qualitative aspects of employing the studied explainability methods and conclude with implications for researchers and engineers that seek to incorporate explanations into user-facing deployments.

[1]  Francesca Toni,et al.  Human-grounded Evaluations of Explanation Methods for Text Classification , 2019, EMNLP.

[2]  Raymond J. Mooney,et al.  Improving VQA and its Explanations by Comparing Competing Explanations , 2020, ArXiv.

[3]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[4]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[5]  G.B. Moody,et al.  The impact of the MIT-BIH Arrhythmia Database , 2001, IEEE Engineering in Medicine and Biology Magazine.

[6]  B. Efron,et al.  Bootstrap confidence intervals , 1996 .

[7]  David Paydarfar,et al.  Explaining Deep Classification of Time-Series Data with Learned Prototypes , 2019, KHD@IJCAI.

[8]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[9]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[10]  Jie Chen,et al.  Explainable Neural Networks based on Additive Index Models , 2018, ArXiv.

[11]  Emmanuel Trouve,et al.  Frames Learned by Prime Convolution Layers in a Deep Learning Framework. , 2020, IEEE transactions on neural networks and learning systems.

[12]  Carlos Guestrin,et al.  Anchors: High-Precision Model-Agnostic Explanations , 2018, AAAI.

[13]  Zhiwei Steven Wu,et al.  Privacy-Preserving Generative Deep Neural Networks Support Clinical Data Sharing , 2017, bioRxiv.

[14]  Cynthia Rudin,et al.  Interpretable Image Recognition with Hierarchical Prototypes , 2019, HCOMP.

[15]  Ignacio Requena,et al.  Are artificial neural networks black boxes? , 1997, IEEE Trans. Neural Networks.

[16]  Oluwasanmi Koyejo,et al.  Examples are not enough, learn to criticize! Criticism for Interpretability , 2016, NIPS.

[17]  Jaime S. Cardoso,et al.  Machine Learning Interpretability: A Survey on Methods and Metrics , 2019, Electronics.

[18]  智一 吉田,et al.  Efficient Graph-Based Image Segmentationを用いた圃場図自動作成手法の検討 , 2014 .

[19]  Daniel A. Keim,et al.  Towards A Rigorous Evaluation Of XAI Methods On Time Series , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[20]  Vineeth N. Balasubramanian,et al.  Grad-CAM++: Generalized Gradient-Based Visual Explanations for Deep Convolutional Networks , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[21]  Casey S. Greene,et al.  Privacy-preserving generative deep neural networks support clinical data sharing , 2017 .

[22]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[23]  Kate Saenko,et al.  RISE: Randomized Input Sampling for Explanation of Black-box Models , 2018, BMVC.

[24]  Roy Assaf,et al.  Explainable Deep Neural Networks for Multivariate Time Series Predictions , 2019, IJCAI.

[25]  Mohit Bansal,et al.  Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior? , 2020, ACL.

[26]  Cynthia Rudin,et al.  This Looks Like That: Deep Learning for Interpretable Image Recognition , 2018 .

[27]  Alun D. Preece,et al.  Interpretability of deep learning models: A survey of results , 2017, 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI).

[28]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[29]  Lalana Kagal,et al.  Explaining Explanations: An Overview of Interpretability of Machine Learning , 2018, 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA).

[30]  Sara Reardon,et al.  Rise of Robot Radiologists , 2019, Nature.

[31]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[32]  Patrick D. McDaniel,et al.  Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep Learning , 2018, ArXiv.

[33]  Vijayan N. Nair,et al.  Adaptive Explainable Neural Networks (Axnns) , 2020, ArXiv.

[34]  Vitaly Shmatikov,et al.  Machine Learning Models that Remember Too Much , 2017, CCS.

[35]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[36]  Andrea Vedaldi,et al.  Interpretable Explanations of Black Boxes by Meaningful Perturbation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[37]  Pete Warden,et al.  Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition , 2018, ArXiv.