What made you do this? Understanding black-box decisions with sufficient input subsets

Local explanation frameworks aim to rationalize particular decisions made by a black-box prediction model. Existing techniques are often restricted to a specific type of predictor or based on input saliency, which may be undesirably sensitive to factors unrelated to the model's decision making process. We instead propose sufficient input subsets that identify minimal subsets of features whose observed values alone suffice for the same decision to be reached, even if all other input feature values are missing. General principles that globally govern a model's decision-making can also be revealed by searching for clusters of such input patterns across many data points. Our approach is conceptually straightforward, entirely model-agnostic, simply implemented using instance-wise backward selection, and able to produce more concise rationales than existing techniques. We demonstrate the utility of our interpretation method on various neural network models trained on text, image, and genomic data.

[1]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[2]  Daniel Jurafsky,et al.  Understanding Neural Networks through Representation Erasure , 2016, ArXiv.

[3]  Yarin Gal,et al.  Real Time Image Saliency for Black Box Classifiers , 2017, NIPS.

[4]  Jure Leskovec,et al.  Learning Attitudes and Attributes from Multi-aspect Reviews , 2012, 2012 IEEE 12th International Conference on Data Mining.

[5]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[6]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[7]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[8]  Dumitru Erhan,et al.  The (Un)reliability of saliency methods , 2017, Explainable AI.

[9]  May D. Wang,et al.  Interpretable Predictions of Clinical Outcomes with An Attention-based Recurrent Neural Network , 2017, BCB.

[10]  Maria L. Rizzo,et al.  Energy distance , 2016 .

[11]  Motoaki Kawanabe,et al.  How to Explain Individual Classification Decisions , 2009, J. Mach. Learn. Res..

[12]  Andrea Vedaldi,et al.  Interpretable Explanations of Black Boxes by Meaningful Perturbation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[13]  Patrick D. McDaniel,et al.  Cleverhans V0.1: an Adversarial Machine Learning Library , 2016, ArXiv.

[14]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[15]  David K. Gifford,et al.  Convolutional neural network architectures for predicting DNA–protein binding , 2016, Bioinform..

[16]  David J. Arenillas,et al.  JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles , 2015, Nucleic Acids Res..

[17]  Alexander Binder,et al.  On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.

[18]  Martin Wattenberg,et al.  Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) , 2017, ICML.

[19]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[20]  Deborah Silver,et al.  Feature Visualization , 1994, Scientific Visualization.

[21]  Li Zhao,et al.  Attention-based LSTM for Aspect-level Sentiment Classification , 2016, EMNLP.

[22]  Bin Yu,et al.  Beyond Word Importance: Contextual Decomposition to Extract Interactions from LSTMs , 2018, ICLR.

[23]  Klaus-Robert Müller,et al.  Learning how to explain neural networks: PatternNet and PatternAttribution , 2017, ICLR.

[24]  Zachary Chase Lipton The mythos of model interpretability , 2016, ACM Queue.

[25]  Thomas Hofmann,et al.  Kernel Methods for Missing Variables , 2005, AISTATS.

[26]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[27]  Regina Barzilay,et al.  Rationalizing Neural Predictions , 2016, EMNLP.

[28]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[29]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[30]  David Wagner,et al.  Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods , 2017, AISec@CCS.

[31]  Johannes Gehrke,et al.  Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission , 2015, KDD.

[32]  Ilya Sutskever,et al.  Learning to Generate Reviews and Discovering Sentiment , 2017, ArXiv.

[33]  Jure Leskovec,et al.  Human Decisions and Machine Predictions , 2017, The quarterly journal of economics.

[34]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[35]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[36]  Le Song,et al.  Learning to Explain: An Information-Theoretic Perspective on Model Interpretation , 2018, ICML.

[37]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[38]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[39]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[40]  Justin A. Sirignano,et al.  Deep Learning for Mortgage Risk , 2016, Journal of Financial Econometrics.

[41]  Arvind Satyanarayan,et al.  The Building Blocks of Interpretability , 2018 .

[42]  Fan Zhang,et al.  Stealing Machine Learning Models via Prediction APIs , 2016, USENIX Security Symposium.