J un 2 01 8 Explaining Explanations : An Approach to Evaluating Interpretability of Machine Learning

There has recently been a surge of work in explanatory artificial intelligence (XAI). This research area tackles the important problem that complex machines and algorithms often cannot provide insights into their behavior and thought processes. XAI allows users and parts of the internal system to be more transparent, providing explanations of their decisions in some level of detail. These explanations are important to ensure algorithmic fairness, identify potential bias/problems in the training data, and to ensure that the algorithms perform as expected. However, explanations produced by these systems is neither standardized nor systematically assessed. In an effort to create best practices and identify open challenges, we provide our definition of explainability and show how it can be used to classify existing literature. We discuss why current approaches to explanatory methods especially for deep neural networks are insufficient. Finally, based on our survey, we conclude with suggested future research directions for explanatory artificial intelligence.

[1]  P. Glöckner,et al.  Extracting Rules from Deep Neural Networks , 2015 .

[2]  Jude W. Shavlik,et al.  Extracting refined rules from knowledge-based neural networks , 2004, Machine Learning.

[3]  H. Tsukimoto,et al.  Rule extraction from neural networks via decision tree induction , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[4]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[5]  Trevor Darrell,et al.  Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding , 2016, EMNLP.

[6]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Jiasen Lu,et al.  Hierarchical Question-Image Co-Attention for Visual Question Answering , 2016, NIPS.

[8]  Wee Kheng Leow,et al.  FERNN: An Algorithm for Fast Extraction of Rules from Neural Networks , 2004, Applied Intelligence.

[9]  Dhruv Batra,et al.  Human Attention in Visual Question Answering: Do Humans and Deep Networks look at the same regions? , 2016, EMNLP.

[10]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[11]  R. Saxe,et al.  Theory of Mind: A Neural Prediction Problem , 2013, Neuron.

[12]  Paul Thagard,et al.  The Best Explanation: Criteria for Theory Choice , 1978 .

[13]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[14]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[15]  Yuxin Peng,et al.  The application of two-level attention models in deep convolutional neural network for fine-grained image classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Stephanie Rosenthal,et al.  Verbalization: Narration of Autonomous Robot Experience , 2016, IJCAI.

[17]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[18]  Bernease Herman,et al.  The Promise and Peril of Human Evaluation for Model Interpretability , 2017, ArXiv.

[19]  Adam Davis Kraft,et al.  Vision by alignment , 2018 .

[20]  Henry A. Kautz,et al.  Generalized Plan Recognition , 1986, AAAI.

[21]  Andrew Slavin Ross,et al.  Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations , 2017, IJCAI.

[22]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[23]  Quanshi Zhang,et al.  Unsupervised Learning of Neural Networks to Explain Neural Networks , 2018, ArXiv.

[24]  Thomas Brox,et al.  Synthesizing the preferred inputs for neurons in neural networks via deep generator networks , 2016, NIPS.

[25]  Quanshi Zhang,et al.  Growing Interpretable Part Graphs on ConvNets via Multi-Shot Learning , 2016, AAAI.

[26]  Sebastian Thrun,et al.  Extracting Rules from Artifical Neural Networks with Distributed Representations , 1994, NIPS.

[27]  Franco Turini,et al.  A Survey of Methods for Explaining Black Box Models , 2018, ACM Comput. Surv..

[28]  Wen-Chuan Lee,et al.  Trojaning Attack on Neural Networks , 2018, NDSS.

[29]  LiMin Fu,et al.  Rule Generation from Neural Networks , 1994, IEEE Trans. Syst. Man Cybern. Syst..

[30]  Joachim Diederich,et al.  Survey and critique of techniques for extracting rules from trained artificial neural networks , 1995, Knowl. Based Syst..

[31]  David B. Leake Focusing Construction and Selection of Abductive Hypotheses , 1993, IJCAI.

[32]  Martin Wattenberg,et al.  TCAV: Relative concept importance testing with Linear Concept Activation Vectors , 2018 .

[33]  Mark Craven,et al.  Extracting comprehensible models from trained neural networks , 1996 .

[34]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[35]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[36]  Steven Salzberg,et al.  Programs for Machine Learning , 2004 .

[37]  Ignacio Requena,et al.  Are artificial neural networks black boxes? , 1997, IEEE Trans. Neural Networks.

[38]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[39]  T. Kathirvalavakumar,et al.  Reverse Engineering the Neural Networks for Rule Extraction in Classification Problems , 2011, Neural Processing Letters.

[40]  Margaret Mitchell,et al.  VQA: Visual Question Answering , 2015, International Journal of Computer Vision.

[41]  Tameru Hailesilassie,et al.  Rule Extraction Algorithm for Deep Neural Networks: A Review , 2016, ArXiv.

[42]  L. Schulz,et al.  Imagination and the generation of new ideas , 2015 .

[43]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[44]  Percy Liang,et al.  Adversarial Examples for Evaluating Reading Comprehension Systems , 2017, EMNLP.

[45]  Quanshi Zhang,et al.  Interpretable Convolutional Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[46]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[47]  David Wagner,et al.  Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods , 2017, AISec@CCS.

[48]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[49]  Michael Carbin,et al.  The Lottery Ticket Hypothesis: Training Pruned Neural Networks , 2018, ArXiv.

[50]  U. Johansson,et al.  International Conference on Information Fusion ( FUSION ) Automatically Balancing Accuracy and Comprehensibility in Predictive Modeling , 2006 .

[51]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[52]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[53]  Geoffrey E. Hinton,et al.  Dynamic Routing Between Capsules , 2017, NIPS.

[54]  Jason Yosinski,et al.  Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Martin Wattenberg,et al.  SmoothGrad: removing noise by adding noise , 2017, ArXiv.

[56]  Stuart J. Russell,et al.  Research Priorities for Robust and Beneficial Artificial Intelligence , 2015, AI Mag..

[57]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  John F. Reeves,et al.  Computational morality: a process model of belief conflict and resolution for story understanding , 1991 .

[59]  Eneldo Loza Mencía,et al.  DeepRED - Rule Extraction from Deep Neural Networks , 2016, DS.

[60]  Markus H. Gross,et al.  A unified view of gradient-based attribution methods for Deep Neural Networks , 2017, NIPS 2017.

[61]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[62]  Seyed-Mohsen Moosavi-Dezfooli,et al.  Universal Adversarial Perturbations , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[63]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[64]  Arvind Narayanan,et al.  Semantics derived automatically from language corpora contain human-like biases , 2016, Science.

[65]  Trevor Darrell,et al.  Multimodal Explanations: Justifying Decisions and Pointing to the Evidence , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[66]  Seth Flaxman,et al.  European Union Regulations on Algorithmic Decision-Making and a "Right to Explanation" , 2016, AI Mag..

[67]  Bolei Zhou,et al.  Object Detectors Emerge in Deep Scene CNNs , 2014, ICLR.

[68]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[69]  Bolei Zhou,et al.  Network Dissection: Quantifying Interpretability of Deep Visual Representations , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[70]  Rudy Setiono,et al.  Extracting -of- Rules from Trained Neural Networks , 2000 .

[71]  Michael W. Berry,et al.  Algorithms and applications for approximate nonnegative matrix factorization , 2007, Comput. Stat. Data Anal..

[72]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[73]  Alexander Binder,et al.  On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.

[74]  Quanshi Zhang,et al.  Visual interpretability for deep learning: a survey , 2018, Frontiers of Information Technology & Electronic Engineering.

[75]  Mohan S. Kankanhalli,et al.  Trends and Trajectories for Explainable, Accountable and Intelligible Systems: An HCI Research Agenda , 2018, CHI.

[76]  Trevor Darrell,et al.  Generating Visual Explanations , 2016, ECCV.

[77]  Madhuri Jha ANN-DT : An Algorithm for Extraction of Decision Trees from Artificial Neural Networks , 2013 .

[78]  T. Lumley,et al.  PRINCIPAL COMPONENT ANALYSIS AND FACTOR ANALYSIS , 2004, Statistical Methods for Biomedical Research.

[79]  Joydeep Ghosh,et al.  Symbolic Interpretation of Artificial Neural Networks , 1999, IEEE Trans. Knowl. Data Eng..

[80]  Sylvain Bromberger,et al.  On What We Know We Don't Know: Explanation, Theory, Linguistics, and How Questions Shape Them , 1993 .

[81]  Maria Fox,et al.  Explainable Planning , 2017, ArXiv.