On Interpretability of Artificial Neural Networks: A Survey

Deep learning as performed by artificial deep neural networks (DNNs) has achieved great successes recently in many important areas that deal with text, images, videos, graphs, and so on. However, the black-box nature of DNNs has become one of the primary obstacles for their wide adoption in mission-critical applications such as medical diagnosis and therapy. Because of the huge potentials of deep learning, the interpretability of DNNs has recently attracted much research attention. In this article, we propose a simple but comprehensive taxonomy for interpretability, systematically review recent studies on interpretability of neural networks, describe applications of interpretability in medicine, and discuss future research directions, such as in relation to fuzzy logic and brain science.

[1]  J. Leskovec,et al.  GNNExplainer: Generating Explanations for Graph Neural Networks , 2019, NeurIPS.

[2]  Jyh-Shing Roger Jang,et al.  ANFIS: adaptive-network-based fuzzy inference system , 1993, IEEE Trans. Syst. Man Cybern..

[3]  Hai Su,et al.  Pathologist-level interpretable whole-slide cancer diagnosis with deep learning , 2019, Nat. Mach. Intell..

[4]  Victor Alves,et al.  Enhancing interpretability of automatically extracted machine learning features: application to a RBM‐Random Forest system on brain lesion segmentation , 2018, Medical Image Anal..

[5]  Ryota Tomioka,et al.  Norm-Based Capacity Control in Neural Networks , 2015, COLT.

[6]  Mohammad Mansouri,et al.  An explainable deep-learning algorithm for the detection of acute intracranial haemorrhage from small datasets , 2018, Nature Biomedical Engineering.

[7]  Albert Gordo,et al.  Learning Global Additive Explanations for Neural Nets Using Model Distillation , 2018 .

[8]  Jooyoung Park,et al.  Universal Approximation Using Radial-Basis-Function Networks , 1991, Neural Computation.

[9]  Ian D. Watson,et al.  An Introduction to Case-Based Reasoning , 1995, UK Workshop on Case-Based Reasoning.

[10]  Giles Hooker,et al.  Discovering additive structure in black box functions , 2004, KDD.

[11]  Donald C. Wunsch,et al.  Neural network explanation using inversion , 2007, Neural Networks.

[12]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[13]  W S McCulloch,et al.  A logical calculus of the ideas immanent in nervous activity , 1990, The Philosophy of Artificial Intelligence.

[14]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[15]  Tommi S. Jaakkola,et al.  Towards Robust Interpretability with Self-Explaining Neural Networks , 2018, NeurIPS.

[16]  Jure Leskovec,et al.  Graph Structure of Neural Networks , 2020, ICML.

[17]  Chuang Gan,et al.  Interpreting Adversarial Examples by Activation Promotion and Suppression , 2019, ArXiv.

[18]  Quanshi Zhang,et al.  Examining CNN representations with respect to Dataset Bias , 2017, AAAI.

[19]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[20]  Jude W. Shavlik,et al.  in Advances in Neural Information Processing , 1996 .

[21]  Agnar Aamodt,et al.  Case-Based Reasoning: Foundational Issues, Methodological Variations, and System Approaches , 1994, AI Commun..

[22]  Cori Bargmann Beyond the connectome: How neuromodulators shape neural circuits , 2012, BioEssays : news and reviews in molecular, cellular and developmental biology.

[23]  Thomas Brox,et al.  Inverting Visual Representations with Convolutional Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  A. McMillan,et al.  Application and Construction of Deep Learning Networks in Medical Imaging , 2020, IEEE Transactions on Radiation and Plasma Medical Sciences.

[25]  J. Yorke,et al.  Period Three Implies Chaos , 1975 .

[26]  Geoffrey E. Hinton,et al.  Restricted Boltzmann machines for collaborative filtering , 2007, ICML '07.

[27]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[28]  Alexander Binder,et al.  On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.

[29]  Xiaolin Hu,et al.  Interpret Neural Networks by Identifying Critical Data Routing Paths , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Cynthia Rudin,et al.  This Looks Like That: Deep Learning for Interpretable Image Recognition , 2018 .

[31]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[32]  Yoshua Bengio,et al.  Plug & Play Generative Networks: Conditional Iterative Generation of Images in Latent Space , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Hod Lipson,et al.  Understanding Neural Networks Through Deep Visualization , 2015, ArXiv.

[34]  Hengyong Yu,et al.  Stabilizing Deep Tomographic Reconstruction Networks , 2020, ArXiv.

[35]  Mike Wu,et al.  Beyond Sparsity: Tree Regularization of Deep Models for Interpretability , 2017, AAAI.

[36]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[37]  Bin Dong,et al.  Beyond Finite Layer Neural Networks: Bridging Deep Architectures and Numerical Differential Equations , 2017, ICML.

[38]  Carlos Guestrin,et al.  Anchors: High-Precision Model-Agnostic Explanations , 2018, AAAI.

[39]  Byron C. Wallace,et al.  Attention is not Explanation , 2019, NAACL.

[40]  R. Tibshirani,et al.  Prototype selection for interpretable classification , 2011, 1202.5933.

[41]  Uwe Kruger,et al.  Competitive performance of a modularized deep neural network compared to commercial algorithms for low-dose CT image reconstruction , 2019, Nat. Mach. Intell..

[42]  Amina Adadi,et al.  Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI) , 2018, IEEE Access.

[43]  Eduard H. Hovy,et al.  An Interpretable Knowledge Transfer Model for Knowledge Base Completion , 2017, ACL.

[44]  Paul F. M. J. Verschure,et al.  A note on chaotic behavior in simple neural networks , 1990, Neural Networks.

[45]  Naftali Tishby,et al.  Opening the Black Box of Deep Neural Networks via Information , 2017, ArXiv.

[46]  Max Welling,et al.  Visualizing Deep Neural Network Decisions: Prediction Difference Analysis , 2017, ICLR.

[47]  Serge J. Belongie,et al.  Residual Networks Behave Like Ensembles of Relatively Shallow Networks , 2016, NIPS.

[48]  M. Hatt,et al.  Machine (Deep) Learning Methods for Image Processing and Radiomics , 2019, IEEE Transactions on Radiation and Plasma Medical Sciences.

[49]  Huan Liu,et al.  Understanding Neural Networks via Rule Extraction , 1995, IJCAI.

[50]  Jian Pei,et al.  Exact and Consistent Interpretation for Piecewise Linear Neural Networks: A Closed Form Solution , 2018, KDD.

[51]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Maya R. Gupta,et al.  Deep Lattice Networks and Partial Monotonic Functions , 2017, NIPS.

[53]  Konrad P. Körding,et al.  Toward an Integration of Deep Learning and Neuroscience , 2016, bioRxiv.

[54]  Sanjay Ranka,et al.  Global Model Interpretation Via Recursive Partitioning , 2018, 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS).

[55]  Léon Bottou,et al.  Wasserstein GAN , 2017, ArXiv.

[56]  Xiang Li,et al.  Deep Learning-Based Image Segmentation on Multimodal Medical Imaging , 2019, IEEE Transactions on Radiation and Plasma Medical Sciences.

[57]  Rich Caruana,et al.  Model compression , 2006, KDD '06.

[58]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Jacek M. Zurada,et al.  Learning Understandable Neural Networks With Nonnegative Weight Constraints , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[60]  W. Pitts,et al.  A Logical Calculus of the Ideas Immanent in Nervous Activity (1943) , 2021, Ideas That Created the Future.

[61]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[62]  Federico Tombari,et al.  Restricting the Flow: Information Bottlenecks for Attribution , 2020, ICLR.

[63]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[64]  Yi Liu,et al.  Teaching Compositionality to CNNs , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  C.-C. Jay Kuo,et al.  Interpretable Convolutional Neural Networks via Feedforward Design , 2018, J. Vis. Commun. Image Represent..

[66]  R. M. Lark,et al.  A comparison of some robust estimators of the variogram for use in soil survey , 2000 .

[67]  Dmitrii Bychkov,et al.  Deep learning based tissue analysis predicts outcome in colorectal cancer , 2018, Scientific Reports.

[68]  Xia Hu,et al.  Techniques for interpretable machine learning , 2018, Commun. ACM.

[69]  Sebastian Thrun,et al.  Extracting Rules from Artifical Neural Networks with Distributed Representations , 1994, NIPS.

[70]  Fei-Fei Li,et al.  Visualizing and Understanding Recurrent Networks , 2015, ArXiv.

[71]  Jure Leskovec,et al.  Interpretable & Explorable Approximations of Black Box Models , 2017, ArXiv.

[72]  Andrea Montanari,et al.  The Generalization Error of Random Features Regression: Precise Asymptotics and the Double Descent Curve , 2019, Communications on Pure and Applied Mathematics.

[73]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[74]  David Duvenaud,et al.  Neural Ordinary Differential Equations , 2018, NeurIPS.

[75]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[76]  G. Corrado,et al.  End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography , 2019, Nature Medicine.

[77]  Regina Barzilay,et al.  Rationalizing Neural Predictions , 2016, EMNLP.

[78]  Kuangyan Song,et al.  "Why Should You Trust My Explanation?" Understanding Uncertainty in LIME Explanations , 2019 .

[79]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[80]  T Poggio,et al.  Regularization Algorithms for Learning That Are Equivalent to Multilayer Networks , 1990, Science.

[81]  Quanshi Zhang,et al.  Visual interpretability for deep learning: a survey , 2018, Frontiers of Information Technology & Electronic Engineering.

[82]  Daniel Jurafsky,et al.  Understanding Neural Networks through Representation Erasure , 2016, ArXiv.

[83]  Andrea Vedaldi,et al.  Understanding deep image representations by inverting them , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[84]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[85]  Quanshi Zhang,et al.  Interpreting CNN knowledge via an Explanatory Graph , 2017, AAAI.

[86]  Chris Russell,et al.  Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR , 2017, ArXiv.

[87]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[88]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[89]  Fei-Fei Li,et al.  Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[90]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[91]  Feng Fu,et al.  Sentiment-Based Prediction of Alternative Cryptocurrency Price Fluctuations Using Gradient Boosting Tree Model , 2018, Front. Phys..

[92]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[93]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[94]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[95]  Gabriel J. Brostow,et al.  Interpretable Transformations with Encoder-Decoder Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[96]  David Rolnick,et al.  The power of deeper networks for expressing natural functions , 2017, ICLR.

[97]  Gustavo Carneiro,et al.  Producing radiologist-quality reports for interpretable artificial intelligence , 2018, ArXiv.

[98]  Tim Verbelen,et al.  Visualizing Convolutional Neural Networks to Improve Decision Support for Skin Lesion Classification , 2018, MLCN/DLF/iMIMIC@MICCAI.

[99]  Ge Wang,et al.  A Perspective on Deep Imaging , 2016, IEEE Access.

[100]  Ge Wang,et al.  A new type of neurons for machine learning , 2017, International journal for numerical methods in biomedical engineering.

[101]  Brendan McCane,et al.  Deep Networks are Effective Encoders of Periodicity , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[102]  Pascal Vincent,et al.  Visualizing Higher-Layer Features of a Deep Network , 2009 .

[103]  John R. Smith,et al.  Collaborative Human-AI (CHAI): Evidence-Based Interpretable Melanoma Classification in Dermoscopic Images , 2018, MLCN/DLF/iMIMIC@MICCAI.

[104]  Martin Wattenberg,et al.  Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) , 2017, ICML.

[105]  Jie Chen,et al.  Explainable Neural Networks based on Additive Index Models , 2018, ArXiv.

[106]  Harsh Jhamtani,et al.  SPINE: SParse Interpretable Neural Embeddings , 2017, AAAI.

[107]  Ali Farhadi,et al.  Towards Transparent Systems: Semantic Characterization of Failure Modes , 2014, ECCV.

[108]  Ge Wang,et al.  Fuzzy logic interpretation of quadratic networks , 2018, Neurocomputing.

[109]  Vinay P. Namboodiri,et al.  Differential Attention for Visual Question Answering , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[110]  Ziyan Wu,et al.  Counterfactual Visual Explanations , 2019, ICML.

[111]  Ali Farhadi,et al.  Predicting Failures of Vision Systems , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[112]  Andrea Vedaldi,et al.  Interpretable Explanations of Black Boxes by Meaningful Perturbation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[113]  Bolei Zhou,et al.  Network Dissection: Quantifying Interpretability of Deep Visual Representations , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[114]  Quanshi Zhang,et al.  Interpretable Convolutional Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[115]  Xi Fang,et al.  Multi-Organ Segmentation Over Partially Labeled Datasets With Multi-Scale Feature Abstraction , 2020, IEEE Transactions on Medical Imaging.

[116]  José Carlos Príncipe,et al.  Understanding Autoencoders with Information Theoretic Concepts , 2018, Neural Networks.

[117]  Andrew Slavin Ross,et al.  Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations , 2017, IJCAI.

[118]  Anima Anandkumar,et al.  Efficient approaches for escaping higher order saddle points in non-convex optimization , 2016, COLT.

[119]  Gustavo K. Rohde,et al.  Neural Networks, Hypersurfaces, and Radon Transforms , 2019, ArXiv.

[120]  R. Srikant,et al.  Why Deep Neural Networks for Function Approximation? , 2016, ICLR.

[121]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[122]  Chris Bishop,et al.  Improving the Generalization Properties of Radial Basis Function Neural Networks , 1991, Neural Computation.

[123]  Gregory D. Hager,et al.  Deep Supervision with Intermediate Concepts , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[124]  Been Kim,et al.  Sanity Checks for Saliency Maps , 2018, NeurIPS.

[125]  Jong Chul Ye,et al.  Deep Convolutional Framelets: A General Deep Learning Framework for Inverse Problems , 2017, SIAM J. Imaging Sci..

[126]  Samuel J. Gershman,et al.  Human-in-the-Loop Interpretability Prior , 2018, NeurIPS.

[127]  LiMin Fu,et al.  Rule Generation from Neural Networks , 1994, IEEE Trans. Syst. Man Cybern. Syst..

[128]  Grzegorz Chrupala,et al.  Representation of Linguistic Form and Function in Recurrent Neural Networks , 2016, CL.

[129]  Bolei Zhou,et al.  Understanding the role of individual units in a deep neural network , 2020, Proceedings of the National Academy of Sciences.

[130]  Matus Telgarsky,et al.  Spectrally-normalized margin bounds for neural networks , 2017, NIPS.

[131]  John J. Hopfield,et al.  Unsupervised learning by competing hidden units , 2018, Proceedings of the National Academy of Sciences.

[132]  Ge Wang,et al.  Learning From Pseudo-Randomness With an Artificial Neural Network–Does God Play Pseudo-Dice? , 2018, IEEE Access.

[133]  Shi Feng,et al.  Interpreting Neural Networks with Nearest Neighbors , 2018, BlackboxNLP@EMNLP.

[134]  T. Poggio,et al.  Deep vs. shallow networks : An approximation theory perspective , 2016, ArXiv.

[135]  Tong Wang,et al.  Gaining Free or Low-Cost Interpretability with Interpretable Partial Substitute , 2019, ICML.

[136]  Bolei Zhou,et al.  Object Detectors Emerge in Deep Scene CNNs , 2014, ICLR.

[137]  Tony Lindeberg,et al.  A computational theory of visual receptive fields , 2013, Biological Cybernetics.

[138]  Eric Horvitz,et al.  Identifying Unknown Unknowns in the Open World: Representations and Policies for Guided Exploration , 2016, AAAI.

[139]  Lin Yang,et al.  MDNet: A Semantically and Visually Interpretable Medical Image Diagnosis Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[140]  Furong Huang,et al.  Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.

[141]  David J. Schwab,et al.  An exact mapping between the Variational Renormalization Group and Deep Learning , 2014, ArXiv.

[142]  Osbert Bastani,et al.  Interpretability via Model Extraction , 2017, ArXiv.

[143]  Emil Pitkin,et al.  Peeking Inside the Black Box: Visualizing Statistical Learning With Plots of Individual Conditional Expectation , 2013, 1309.6392.

[144]  John David N. Dionisio,et al.  Case-based explanation of non-case-based learning methods , 1999, AMIA.

[145]  Ohad Shamir,et al.  The Power of Depth for Feedforward Neural Networks , 2015, COLT.

[146]  Eric P. Xing,et al.  Harnessing Deep Neural Networks with Logic Rules , 2016, ACL.

[147]  Denise R. Aberle,et al.  An Interpretable Deep Hierarchical Semantic Convolutional Neural Network for Lung Nodule Malignancy Classification , 2018, Expert Syst. Appl..

[148]  Ge Wang,et al.  Soft Autoencoder and Its Wavelet Adaptation Interpretation , 2020, IEEE Transactions on Computational Imaging.

[149]  S. Snyder,et al.  Adenosine as a neuromodulator. , 1985, Annual review of neuroscience.

[150]  Loïc Le Folgoc,et al.  Attention U-Net: Learning Where to Look for the Pancreas , 2018, ArXiv.

[151]  Shi Feng,et al.  Understanding Impacts of High-Order Loss Approximations and Features in Deep Learning Interpretation , 2019, ICML.

[152]  Haiping Huang,et al.  Mechanisms of dimensionality reduction and decorrelation in deep neural networks , 2018, Physical Review E.

[153]  Yair Zick,et al.  Algorithmic Transparency via Quantitative Input Influence: Theory and Experiments with Learning Systems , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[154]  Chandan Singh,et al.  Hierarchical interpretations for neural network predictions , 2018, ICLR.

[155]  Janet L. Kolodner,et al.  An introduction to case-based reasoning , 1992, Artificial Intelligence Review.

[156]  S. Lipovetsky,et al.  Analysis of regression in game theory approach , 2001 .

[157]  Nabile M. Safdar,et al.  Ethics of Artificial Intelligence in Radiology: Summary of the Joint European and North American Multisociety Statement , 2019, Canadian Association of Radiologists journal = Journal l'Association canadienne des radiologistes.

[158]  Cynthia Rudin,et al.  Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead , 2018, Nature Machine Intelligence.

[159]  Hongming Shan,et al.  Shape and margin-aware lung nodule classification in low-dose CT images via soft activation mapping , 2020, Medical Image Anal..

[160]  S. Bozinovski,et al.  Using EEG alpha rhythm to control a mobile robot , 1988, Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[161]  Kewei Tu,et al.  Cold-start and Interpretability: Turning Regular Expressions into Trainable Recurrent Neural Networks , 2020, EMNLP.

[162]  Seth Flaxman,et al.  European Union Regulations on Algorithmic Decision-Making and a "Right to Explanation" , 2016, AI Mag..

[163]  Alun D. Preece,et al.  Interpretability of deep learning models: A survey of results , 2017, 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI).

[164]  Daniel Rueckert,et al.  Learning Interpretable Anatomical Features Through Deep Generative Models: Application to Cardiac Remodeling , 2018, MICCAI.

[165]  Lotfi A. Zadeh,et al.  Fuzzy Logic , 2009, Encyclopedia of Complexity and Systems Science.

[166]  Ronan Collobert,et al.  From image-level to pixel-level labeling with Convolutional Networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[167]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[168]  Zachary Chase Lipton The mythos of model interpretability , 2016, ACM Queue.

[169]  Lalana Kagal,et al.  Explaining Explanations: An Overview of Interpretability of Machine Learning , 2018, 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA).

[170]  Michio Sugeno,et al.  Fuzzy identification of systems and its applications to modeling and control , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[171]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[172]  Avanti Shrikumar,et al.  Not Just A Black Box : Interpretable Deep Learning by Propagating Activation Differences , 2016 .

[173]  Alexander Binder,et al.  Explaining nonlinear classification decisions with deep Taylor decomposition , 2015, Pattern Recognit..

[174]  Hod Lipson,et al.  Convergent Learning: Do different neural networks learn the same representations? , 2015, FE@NIPS.

[175]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[176]  Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI , 2019, Inf. Fusion.

[177]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[178]  Marko Robnik-Sikonja,et al.  Explaining Classifications For Individual Instances , 2008, IEEE Transactions on Knowledge and Data Engineering.

[179]  Cynthia Rudin,et al.  An Interpretable Model with Globally Consistent Explanations for Credit Risk , 2018, ArXiv.

[180]  Yan Liu,et al.  Interpretable Deep Models for ICU Outcome Prediction , 2016, AMIA.

[181]  Boyang Li,et al.  Does Interpretability of Neural Networks Imply Adversarial Robustness? , 2019, ArXiv.

[182]  Thomas Brox,et al.  Synthesizing the preferred inputs for neurons in neural networks via deep generator networks , 2016, NIPS.

[183]  Honglak Lee,et al.  Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[184]  Lixin Fan,et al.  Revisit Fuzzy Neural Network: Demystifying Batch Normalization and ReLU with Generalized Hamming Network , 2017, NIPS.

[185]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[186]  Michael C. Mozer,et al.  Adapted Deep Embeddings: A Synthesis of Methods for k-Shot Inductive Transfer Learning , 2018, NeurIPS.

[187]  Jairo A. Gutiérrez,et al.  ISeeU: Visually interpretable deep learning for mortality prediction inside the ICU , 2019, J. Biomed. Informatics.

[188]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[189]  Ivan Laptev,et al.  Is object localization for free? - Weakly-supervised learning with convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[190]  Quanshi Zhang,et al.  Towards a Deep and Unified Understanding of Deep Neural Models in NLP , 2019, ICML.

[191]  Suresh Venkatasubramanian,et al.  Auditing black-box models for indirect influence , 2016, Knowledge and Information Systems.

[192]  Bo Li,et al.  Towards Interpretable R-CNN by Unfolding Latent Structures , 2017, 1711.05226.

[193]  Klaus-Robert Müller,et al.  Interpretable deep neural networks for single-trial EEG classification , 2016, Journal of Neuroscience Methods.

[194]  Markus H. Gross,et al.  Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation , 2019, ICML.

[195]  Olcay Boz,et al.  Extracting decision trees from trained neural networks , 2002, KDD.

[196]  Klaus-Robert Müller,et al.  Explaining Recurrent Neural Network Predictions in Sentiment Analysis , 2017, WASSA@EMNLP.

[197]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[198]  Bernd Bischl,et al.  Visualizing the Feature Importance for Black Box Models , 2018, ECML/PKDD.

[199]  Martin Wattenberg,et al.  SmoothGrad: removing noise by adding noise , 2017, ArXiv.

[200]  Franco Turini,et al.  A Survey of Methods for Explaining Black Box Models , 2018, ACM Comput. Surv..

[201]  Mikhail Belkin,et al.  Reconciling modern machine-learning practice and the classical bias–variance trade-off , 2018, Proceedings of the National Academy of Sciences.

[202]  Knowledge-Based Analysis for Mortality Prediction From CT Images , 2019, IEEE Journal of Biomedical and Health Informatics.

[203]  Frédéric Maire,et al.  On the convergence of validity interval analysis , 2000, IEEE Trans. Neural Networks Learn. Syst..

[204]  Adel Javanmard,et al.  Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks , 2017, IEEE Transactions on Information Theory.

[205]  Marcus A. Badgeley,et al.  Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study , 2018, PLoS medicine.

[206]  Shing-Tung Yau,et al.  A Geometric View of Optimal Transportation and Generative Model , 2017, Comput. Aided Geom. Des..

[207]  Deborah Silver,et al.  Feature Visualization , 1994, Scientific Visualization.

[208]  Franco Turini,et al.  Local Rule-Based Explanations of Black Box Decision Systems , 2018, ArXiv.

[209]  Arthur Jacot,et al.  Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.

[210]  Elmar Kotter,et al.  Ethics of artificial intelligence in radiology: summary of the joint European and North American multisociety statement , 2019, Insights into Imaging.

[211]  Detlef Nauck,et al.  A Fuzzy Perceptron as a Generic Model for Neuro-Fuzzy Approaches , 1994 .

[212]  Stefanie Jegelka,et al.  ResNet with one-neuron hidden layers is a Universal Approximator , 2018, NeurIPS.

[213]  Matthias Hein,et al.  The Loss Surface of Deep and Wide Neural Networks , 2017, ICML.

[214]  L. Kadanoff Variational Principles and Approximate Renormalization Group Calculations , 1975 .