Debugging Tests for Model Explanations

We investigate whether post-hoc model explanations are effective for diagnosing model errors--model debugging. In response to the challenge of explaining a model's prediction, a vast array of explanation methods have been proposed. Despite increasing use, it is unclear if they are effective. To start, we categorize \textit{bugs}, based on their source, into:~\textit{data, model, and test-time} contamination bugs. For several explanation methods, we assess their ability to: detect spurious correlation artifacts (data contamination), diagnose mislabeled training examples (data contamination), differentiate between a (partially) re-initialized model and a trained one (model contamination), and detect out-of-distribution inputs (test-time contamination). We find that the methods tested are able to diagnose a spurious background bug, but not conclusively identify mislabeled training examples. In addition, a class of methods, that modify the back-propagation algorithm are invariant to the higher layer parameters of a deep network; hence, ineffective for diagnosing model contamination. We complement our analysis with a human subject study, and find that subjects fail to identify defective models using attributions, but instead rely, primarily, on model predictions. Taken together, our results provide guidance for practitioners and researchers turning to explanations as tools for model debugging.

[1]  Motoaki Kawanabe,et al.  How to Explain Individual Classification Decisions , 2009, J. Mach. Learn. Res..

[2]  Sriram K. Rajamani,et al.  Debugging Machine Learning Tasks , 2016, ArXiv.

[3]  Bolei Zhou,et al.  Network Dissection: Quantifying Interpretability of Deep Visual Representations , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Daniel G. Goldstein,et al.  Manipulating and Measuring Model Interpretability , 2018, CHI.

[5]  Abubakar Abid,et al.  Interpretation of Neural Networks is Fragile , 2017, AAAI.

[6]  Alexander Binder,et al.  Layer-Wise Relevance Propagation for Neural Networks with Local Renormalization Layers , 2016, ICANN.

[7]  C. V. Jawahar,et al.  Cats and dogs , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Alexander Binder,et al.  Explaining nonlinear classification decisions with deep Taylor decomposition , 2015, Pattern Recognit..

[9]  Sameer Singh,et al.  Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods , 2020, AIES.

[10]  Fei-Fei Li,et al.  Novel Dataset for Fine-Grained Image Categorization : Stanford Dogs , 2012 .

[11]  Shinichi Nakajima,et al.  Towards Best Practice in Explaining Neural Network Decisions with LRP , 2019, 2020 International Joint Conference on Neural Networks (IJCNN).

[12]  Anna Shcherbina,et al.  Not Just a Black Box: Learning Important Features Through Propagating Activation Differences , 2016, ArXiv.

[13]  Ran Gilad-Bachrach,et al.  Debugging Machine Learning Models , 2016 .

[14]  Pradeep Ravikumar,et al.  Representer Point Selection for Explaining Deep Neural Networks , 2018, NeurIPS.

[15]  Marcus A. Badgeley,et al.  Deep learning predicts hip fracture using confounding patient and healthcare variables , 2018, npj Digital Medicine.

[16]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[17]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[18]  Enrico Costanza,et al.  Evaluating saliency map explanations for convolutional neural networks: a user study , 2020, IUI.

[19]  Prudhvi Gurram,et al.  Sanity Checks for Saliency Metrics , 2019, AAAI.

[20]  Klaus-Robert Müller,et al.  Explanations can be manipulated and geometry is to blame , 2019, NeurIPS.

[21]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[22]  Alison Cawsey,et al.  User modelling in interactive explanations , 1993, User Modeling and User-Adapted Interaction.

[23]  Jeremy Tan,et al.  Automatic Shadow Detection in 2D Ultrasound , 2018 .

[24]  Martin Wattenberg,et al.  Human-Centered Tools for Coping with Imperfect Algorithms During Medical Decision-Making , 2019, CHI.

[25]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[26]  Martin Wattenberg,et al.  SmoothGrad: removing noise by adding noise , 2017, ArXiv.

[27]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[28]  Alison Cawsey,et al.  Generating Interactive Explanations , 1991, AAAI.

[29]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[30]  Klaus-Robert Müller,et al.  Investigating the influence of noise and distractors on the interpretation of neural networks , 2016, ArXiv.

[31]  Klaus-Robert Müller,et al.  Learning how to explain neural networks: PatternNet and PatternAttribution , 2017, ICLR.

[32]  Alexander Binder,et al.  Evaluating the Visualization of What a Deep Neural Network Has Learned , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[33]  Been Kim,et al.  Sanity Checks for Saliency Maps , 2018, NeurIPS.

[34]  Yang Zhang,et al.  A Theoretical Explanation for Perplexing Behaviors of Backpropagation-based Visualizations , 2018, ICML.

[35]  Gezheng Wen,et al.  Comparative study of computational visual attention models on two-dimensional medical images , 2017, Journal of medical imaging.

[36]  John Riedl,et al.  Explaining collaborative filtering recommendations , 2000, CSCW '00.

[37]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[38]  Alexander Binder,et al.  Unmasking Clever Hans predictors and assessing what machines really learn , 2019, Nature Communications.

[39]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[40]  Andrea Vedaldi,et al.  Salient Deconvolutional Networks , 2016, ECCV.

[41]  G Carenini,et al.  Generating patient-specific interactive natural language explanations. , 1994, Proceedings. Symposium on Computer Applications in Medical Care.

[42]  Jacob Andreas,et al.  Are Visual Explanations Useful? A Case Study in Model-in-the-Loop Prediction , 2020, ArXiv.

[43]  L. Shapley A Value for n-person Games , 1988 .

[44]  Dumitru Erhan,et al.  A Benchmark for Interpretability Methods in Deep Neural Networks , 2018, NeurIPS.

[45]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[46]  Taesup Moon,et al.  Fooling Neural Network Interpretations via Adversarial Model Manipulation , 2019, NeurIPS.

[47]  Frederick Liu,et al.  Estimating Training Data Influence by Tracking Gradient Descent , 2020, NeurIPS.

[48]  Wojciech Samek,et al.  Methods for interpreting and understanding deep neural networks , 2017, Digit. Signal Process..

[49]  Andrew Slavin Ross,et al.  Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations , 2017, IJCAI.

[50]  D. Sculley,et al.  Hidden Technical Debt in Machine Learning Systems , 2015, NIPS.

[51]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[52]  Himabindu Lakkaraju,et al.  "How do I fool you?": Manipulating User Trust via Misleading Black Box Explanations , 2019, AIES.

[53]  Hua Shen,et al.  How Useful Are the Machine-Generated Interpretations to General Users? A Human Evaluation on Guessing the Incorrectly Predicted Labels , 2020, HCOMP.

[54]  Alexander Binder,et al.  On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.

[55]  Martin Wattenberg,et al.  Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) , 2017, ICML.

[56]  Ankur Taly,et al.  Explainable machine learning in deployment , 2019, FAT*.

[57]  Chandan Singh,et al.  Interpretations are useful: penalizing explanations to align neural networks with prior knowledge , 2019, ICML.

[58]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[59]  Been Kim,et al.  Local Explanation Methods for Deep Neural Networks Lack Sensitivity to Parameter Values , 2018, ICLR.

[60]  David S. Melnick,et al.  International evaluation of an AI system for breast cancer screening , 2020, Nature.

[61]  Bolei Zhou,et al.  Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[62]  Shi Feng,et al.  What can AI do for me?: evaluating machine learning interpretations in cooperative play , 2019, IUI.

[63]  Chih-Kuan Yeh,et al.  On the (In)fidelity and Sensitivity of Explanations , 2019, NeurIPS.

[64]  Vivian Lai,et al.  On Human Predictions with Explanations and Predictions of Machine Learning Models: A Case Study on Deception Detection , 2018, FAT.

[65]  Leon Sixt,et al.  When Explanations Lie: Why Modified BP Attribution Fails , 2019, ArXiv.

[66]  Pascal Sturmfels,et al.  Learning Explainable Models Using Attribution Priors , 2019, ArXiv.