Interpreting Black Box Models via Hypothesis Testing

In science and medicine, model interpretations may be reported as discoveries of natural phenomena or used to guide patient treatments. In such high-stakes tasks, false discoveries may lead investigators astray. These applications would therefore benefit from control over the finite-sample error rate of interpretations. We reframe black box model interpretability as a multiple hypothesis testing problem. The task is to discover "important" features by testing whether the model prediction is significantly different from what would be expected if the features were replaced with uninformative counterfactuals. We propose two testing methods: one that provably controls the false discovery rate but which is not yet feasible for large-scale applications, and an approximate testing method which can be applied to real-world data sets. In simulation, both tests have high power relative to existing interpretability methods. When applied to state-of-the-art vision and language models, the framework selects features that intuitively explain model predictions. The resulting explanations have the additional advantage that they are themselves easy to interpret.

[1]  E. Candès,et al.  Controlling the false discovery rate via knockoffs , 2014, 1404.5609.

[2]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[3]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[4]  Yi Wang,et al.  Image Inpainting via Generative Multi-column Convolutional Neural Networks , 2018, NeurIPS.

[5]  Lucas Janson,et al.  Panning for gold: ‘model‐X’ knockoffs for high dimensional controlled variable selection , 2016, 1610.02351.

[6]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Dennis L. Sun,et al.  Exact post-selection inference, with application to the lasso , 2013, 1311.6238.

[8]  Andrea Vedaldi,et al.  Interpretable Explanations of Black Boxes by Meaningful Perturbation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[9]  Omer Levy,et al.  SpanBERT: Improving Pre-training by Representing and Predicting Spans , 2019, TACL.

[10]  Duen Horng Chau,et al.  Interactive Classification for Deep Learning Interpretation , 2018, ArXiv.

[11]  Stefan Lee,et al.  ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.

[12]  Thomas S. Huang,et al.  Free-Form Image Inpainting With Gated Convolution , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Eugene S. Edgington,et al.  Randomization Tests , 2011, International Encyclopedia of Statistical Science.

[14]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[15]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[16]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[17]  Thomas G. Dietterich,et al.  Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , 2018, ICLR.

[18]  Haoran Zhang,et al.  The Holdout Randomization Test: Principled and Easy Black Box Feature Selection , 2018, 1811.00645.

[19]  R. Thaler Anomalies: The Winner's Curse , 1988 .

[20]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[21]  Abhinav Vishnu,et al.  Deep learning for computational chemistry , 2017, J. Comput. Chem..

[22]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[23]  Thomas S. Huang,et al.  Generative Image Inpainting with Contextual Attention , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Zachary C. Lipton,et al.  The mythos of model interpretability , 2018, Commun. ACM.

[25]  Brendan J. Frey,et al.  Classifying and segmenting microscopy images with deep multiple instance learning , 2015, Bioinform..

[26]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[27]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[28]  Emmanuel J. Candes,et al.  Robust inference with knockoffs , 2018, The Annals of Statistics.

[29]  Byron C. Wallace,et al.  Attention is not Explanation , 2019, NAACL.

[30]  Bernhard Schölkopf,et al.  Kernel-based Conditional Independence Test and Application in Causal Discovery , 2011, UAI.

[31]  Christine Guillemot,et al.  Image Inpainting : Overview and Recent Advances , 2014, IEEE Signal Processing Magazine.

[32]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[33]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[34]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  David R. Kelley,et al.  Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks , 2015, bioRxiv.

[36]  Subhashini Venugopalan,et al.  Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs. , 2016, JAMA.

[37]  Bram van Ginneken,et al.  Chest X-ray Inpainting with Deep Generative Models , 2018, ArXiv.

[38]  Minh N. Do,et al.  Semantic Image Inpainting with Perceptual and Contextual Losses , 2016, ArXiv.

[39]  Bryan F.J. Manly,et al.  Randomization Tests, 4th Edition by Eugene S. Edgington, Patrick Onghena , 2007 .

[40]  G. Coukos,et al.  Neoantigen-based cancer immunotherapy. , 2016, Annals of translational medicine.

[41]  Haifeng Hu,et al.  Image Inpainting Based on Patch-GANs , 2019, IEEE Access.

[42]  Yang Liu,et al.  On Identifiability in Transformers , 2020, ICLR.

[43]  Christopher D. Chambers,et al.  Redefine statistical significance , 2017, Nature Human Behaviour.

[44]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[45]  Matthias Bethge,et al.  ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness , 2018, ICLR.

[46]  Thomas B. Berrett,et al.  The conditional permutation test , 2018, 1807.05405.

[47]  Le Song,et al.  Learning to Explain: An Information-Theoretic Perspective on Model Interpretation , 2018, ICML.

[48]  David Duvenaud,et al.  Explaining Image Classifiers by Adaptive Dropout and Generative In-filling , 2018, ArXiv.

[49]  Sreeram Kannan,et al.  Mimic and Classify : A meta-algorithm for Conditional Independence Testing , 2018, ArXiv.

[50]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[51]  Been Kim,et al.  Sanity Checks for Saliency Maps , 2018, NeurIPS.

[52]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[53]  B. Frey,et al.  The human splicing code reveals new insights into the genetic determinants of disease , 2015, Science.

[54]  Andrew J. Schaumberg,et al.  D R A F T H&E-stained Whole Slide Image Deep Learning Predicts SPOP Mutation State in Prostate Cancer , 2017 .

[55]  Thomas B. Berrett,et al.  The conditional permutation test for independence while controlling for confounders , 2018, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[56]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[57]  Ghassem Tofighi,et al.  Classification of Alzheimer's Disease using fMRI Data and Deep Learning Convolutional Neural Networks , 2016, ArXiv.

[58]  Minh N. Do,et al.  Semantic Image Inpainting with Deep Generative Models , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  B. Frey,et al.  Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , 2015, Nature Biotechnology.