Representativity & Consistency Measures for Deep Neural Network Explanations

The adoption of machine learning in critical contexts requires a reliable explanation of why the algorithm makes certain predictions. To address this issue, many methods have been proposed to explain the predictions of these black box models. Despite the choice of those many methods, little effort has been made to ensure that the explanations produced are objectively relevant. While it is possible to establish a number of desirable properties of a good explanation, it is more difficult to evaluate them. As a result, no measures are actually associated with the properties of consistency and generalization of explanations. We are introducing a new procedure to compute two new measures, Relative Consistency ReCo and Mean Generalization M eGe, respectively for consistency and generalization of explanations. Our results on several image classification datasets using progressively degraded models allow us to validate empirically the reliability of those measures. We compare the results obtained with those of existing measures. Finally we demonstrate the potential of the measures by applying them to different families of models, revealing an interesting link between gradient-based explanations methods and 1-Lipschitz networks.

[1]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[2]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[4]  Bolei Zhou,et al.  Object Detectors Emerge in Deep Scene CNNs , 2014, ICLR.

[5]  Emily Chen,et al.  How do Humans Understand Explanations from Machine Learning Systems? An Evaluation of the Human-Interpretability of Explanation , 2018, ArXiv.

[6]  L. R. Dice Measures of the Amount of Ecologic Association Between Species , 1945 .

[7]  C. Spearman The proof and measurement of association between two things. , 2015, International journal of epidemiology.

[8]  Zachary Chase Lipton The mythos of model interpretability , 2016, ACM Queue.

[9]  Thomas Serre,et al.  Deep Learning: The Good, the Bad, and the Ugly. , 2019, Annual review of vision science.

[10]  Carlos Guestrin,et al.  Anchors: High-Precision Model-Agnostic Explanations , 2018, AAAI.

[11]  Judith Masthoff,et al.  A Survey of Explanations in Recommender Systems , 2007, 2007 IEEE 23rd International Conference on Data Engineering Workshop.

[12]  Martin Wattenberg,et al.  SmoothGrad: removing noise by adding noise , 2017, ArXiv.

[13]  Andreas Dengel,et al.  EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification , 2017, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[14]  Andrea Vedaldi,et al.  Interpretable Explanations of Black Boxes by Meaningful Perturbation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[15]  Alexander Binder,et al.  On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.

[16]  Been Kim,et al.  Benchmarking Attribution Methods with Relative Feature Importance , 2019 .

[17]  Nathan Srebro,et al.  Exploring Generalization in Deep Learning , 2017, NIPS.

[18]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[19]  Been Kim,et al.  Sanity Checks for Saliency Maps , 2018, NeurIPS.

[20]  Dumitru Erhan,et al.  A Benchmark for Interpretability Methods in Deep Neural Networks , 2018, NeurIPS.

[21]  Lars Kai Hansen,et al.  IROF: a low resource evaluation metric for explanation methods , 2020, ArXiv.

[22]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[23]  Prudhvi Gurram,et al.  Sanity Checks for Saliency Metrics , 2019, AAAI.

[24]  Max Welling,et al.  Visualizing Deep Neural Network Decisions: Prediction Difference Analysis , 2017, ICLR.

[25]  Tommi S. Jaakkola,et al.  Towards Robust Interpretability with Self-Explaining Neural Networks , 2018, NeurIPS.

[26]  Marko Bohanec,et al.  Perturbation-Based Explanations of Prediction Models , 2018, Human and Machine Learning.

[27]  Adrian Weller,et al.  Evaluating and Aggregating Feature-based Model Explanations , 2020, IJCAI.

[28]  Abubakar Abid,et al.  Interpretation of Neural Networks is Fragile , 2017, AAAI.

[29]  Felix Bießmann,et al.  Quantifying Interpretability and Trust in Machine Learning Systems , 2019, ArXiv.

[30]  Matthew Sotoudeh,et al.  Computing Linear Restrictions of Neural Networks , 2019, NeurIPS.

[31]  Pascal Sturmfels,et al.  Visualizing the Impact of Feature Attribution Baselines , 2020 .

[32]  Lalana Kagal,et al.  Explaining Explanations: An Overview of Interpretability of Machine Learning , 2018, 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA).

[33]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[34]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[35]  Mathieu Serrurier,et al.  Achieving robustness in classification using optimal transport with hinge regularization , 2020, ArXiv.

[36]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[37]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[38]  Muhammad Usama,et al.  Towards Robust Neural Networks with Lipschitz Continuity , 2018, IWDW.

[39]  Frank Allgöwer,et al.  Training robust neural networks using Lipschitz bounds , 2020, ArXiv.

[40]  Chih-Kuan Yeh,et al.  On the (In)fidelity and Sensitivity for Explanations. , 2019, 1901.09392.

[41]  Tim Miller,et al.  Explanation in Artificial Intelligence: Insights from the Social Sciences , 2017, Artif. Intell..

[42]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[43]  Cengiz Öztireli,et al.  Towards better understanding of gradient-based attribution methods for Deep Neural Networks , 2017, ICLR.

[44]  Daniel Jurafsky,et al.  Understanding Neural Networks through Representation Erasure , 2016, ArXiv.

[45]  Alexander Binder,et al.  Evaluating the Visualization of What a Deep Neural Network Has Learned , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[46]  Chong Wang,et al.  Reading Tea Leaves: How Humans Interpret Topic Models , 2009, NIPS.

[47]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[48]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[49]  Graham W. Taylor,et al.  Adaptive deconvolutional networks for mid and high level feature learning , 2011, 2011 International Conference on Computer Vision.