How to Explain Neural Networks: an Approximation Perspective

The lack of interpretability has hindered the large-scale adoption of AI technologies. However, the fundamental idea of interpretability, as well as how to put it into practice, remains unclear. We provide notions of interpretability based on approximation theory in this study. We first implement this approximation interpretation on a specific model (fully connected neural network) and then propose to use MLP as a universal interpreter to explain arbitrary black-box models. Extensive experiments demonstrate the effectiveness of our approach.

[1]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[2]  Jian Pei,et al.  Measuring Model Complexity of Neural Networks with Curve Activation Functions , 2020, KDD.

[3]  Stan Matwin,et al.  Improving the Interpretability of Deep Neural Networks with Knowledge Distillation , 2018, 2018 IEEE International Conference on Data Mining Workshops (ICDMW).

[4]  Ofer Melnik,et al.  Decision Region Connectivity Analysis: A Method for Analyzing High-Dimensional Classifiers , 2002, Machine Learning.

[5]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[6]  Bolei Zhou,et al.  Object Detectors Emerge in Deep Scene CNNs , 2014, ICLR.

[7]  Tommi S. Jaakkola,et al.  Towards Robust Interpretability with Self-Explaining Neural Networks , 2018, NeurIPS.

[8]  Cynthia Rudin,et al.  Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead , 2018, Nature Machine Intelligence.

[9]  Fenglei Fan,et al.  On Interpretability of Artificial Neural Networks: A Survey , 2020, IEEE Transactions on Radiation and Plasma Medical Sciences.

[10]  Jian Pei,et al.  Exact and Consistent Interpretation for Piecewise Linear Neural Networks: A Closed Form Solution , 2018, KDD.

[11]  Gavin Brown,et al.  Ensemble Learning , 2010, Encyclopedia of Machine Learning and Data Mining.

[12]  Pascal Vincent,et al.  Visualizing Higher-Layer Features of a Deep Network , 2009 .

[13]  Razvan Pascanu,et al.  On the number of inference regions of deep feed forward networks with piece-wise linear activations , 2013, ICLR.

[14]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Amina Adadi,et al.  Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI) , 2018, IEEE Access.

[16]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[17]  Jianping Gou,et al.  Knowledge Distillation: A Survey , 2020, International Journal of Computer Vision.

[18]  Yu Li,et al.  On the Decision Boundary of Deep Neural Networks , 2018, ArXiv.

[19]  Francisco Herrera,et al.  Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI , 2020, Inf. Fusion.

[20]  Shing-Tung Yau,et al.  Geometric Understanding of Deep Learning , 2018, ArXiv.

[21]  Jiliang Tang,et al.  Characterizing the Decision Boundary of Deep Neural Networks , 2019, ArXiv.

[22]  Dumitru Erhan,et al.  The (Un)reliability of saliency methods , 2017, Explainable AI.

[23]  Francois Fleuret,et al.  Full-Gradient Representation for Neural Network Visualization , 2019, NeurIPS.

[24]  David Rolnick,et al.  Deep ReLU Networks Have Surprisingly Few Activation Patterns , 2019, NeurIPS.

[25]  Vineeth N. Balasubramanian,et al.  Grad-CAM++: Generalized Gradient-Based Visual Explanations for Deep Convolutional Networks , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[26]  Hod Lipson,et al.  Understanding Neural Networks Through Deep Visualization , 2015, ArXiv.

[27]  Vlado Menkovski,et al.  Explaining Predictions by Approximating the Local Decision Boundary , 2020, ArXiv.

[28]  Klaus-Robert Müller,et al.  Explaining Deep Neural Networks and Beyond: A Review of Methods and Applications , 2021, Proceedings of the IEEE.

[29]  Zijian Zhang,et al.  Score-CAM: Score-Weighted Visual Explanations for Convolutional Neural Networks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[30]  L. Longo,et al.  Explainable Artificial Intelligence: a Systematic Review , 2020, ArXiv.

[31]  Alexander Binder,et al.  On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.

[32]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[33]  Andrea Vedaldi,et al.  Interpretable Explanations of Black Boxes by Meaningful Perturbation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[34]  Quanshi Zhang,et al.  Interpretable Convolutional Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Cengiz Öztireli,et al.  Towards better understanding of gradient-based attribution methods for Deep Neural Networks , 2017, ICLR.

[36]  Cynthia Rudin,et al.  Deep Learning for Case-based Reasoning through Prototypes: A Neural Network that Explains its Predictions , 2017, AAAI.

[37]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[38]  Alexander Binder,et al.  Explaining nonlinear classification decisions with deep Taylor decomposition , 2015, Pattern Recognit..

[39]  Geoffrey E. Hinton,et al.  Distilling a Neural Network Into a Soft Decision Tree , 2017, CEx@AI*IA.

[40]  Felix Assion,et al.  Understanding the Decision Boundary of Deep Neural Networks: An Empirical Study , 2020, ArXiv.

[41]  Zachary Chase Lipton The mythos of model interpretability , 2016, ACM Queue.

[42]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[43]  Zhe L. Lin,et al.  Top-Down Neural Attention by Excitation Backprop , 2016, International Journal of Computer Vision.

[44]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[45]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[46]  Martin Wattenberg,et al.  SmoothGrad: removing noise by adding noise , 2017, ArXiv.

[47]  Dongrui Wu,et al.  Empirical Studies on the Properties of Linear Regions in Deep Neural Networks , 2020, ICLR.

[48]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[49]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[50]  Peter Tiňo,et al.  A Survey on Neural Network Interpretability , 2020, IEEE Transactions on Emerging Topics in Computational Intelligence.

[51]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[52]  Stéphane Avner Extraction of comprehensive symbolic rules from a multi-layer perceptron , 1996 .

[53]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.