Learning Explainable Models Using Attribution Priors

Two important topics in deep learning both involve incorporating humans into the modeling process: Model priors transfer information from humans to a model by constraining the model's parameters; Model attributions transfer information from a model to humans by explaining the model's behavior. We propose connecting these topics with attribution priors (this https URL), which allow humans to use the common language of attributions to enforce prior expectations about a model's behavior during training. We develop a differentiable axiomatic feature attribution method called expected gradients and show how to directly regularize these attributions during training. We demonstrate the broad applicability of attribution priors ($\Omega$) by presenting three distinct examples that regularize models to behave more intuitively in three different domains: 1) on image data, $\Omega_{\textrm{pixel}}$ encourages models to have piecewise smooth attribution maps; 2) on gene expression data, $\Omega_{\textrm{graph}}$ encourages models to treat functionally related genes similarly; 3) on a health care dataset, $\Omega_{\textrm{sparse}}$ encourages models to rely on fewer features. In all three domains, attribution priors produce models with more intuitive behavior and better generalization performance by encoding constraints that would otherwise be very difficult to encode using standard model priors.

[1]  Matt Fredrikson,et al.  Supervising Feature Influence , 2018, ArXiv.

[2]  Michael I. Jordan,et al.  Theoretically Principled Trade-off between Robustness and Accuracy , 2019, ICML.

[3]  Andrew Slavin Ross,et al.  Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations , 2017, IJCAI.

[4]  Marcus A. Badgeley,et al.  Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study , 2018, PLoS medicine.

[5]  Wei Cheng,et al.  Graph-regularized dual Lasso for robust eQTL mapping , 2014, Bioinform..

[6]  Sebastian Nowozin,et al.  Adversarially Robust Training through Structured Gradient Regularization , 2018, ArXiv.

[7]  Qianshun Chang,et al.  Efficient Algorithm for Isotropic and Anisotropic Total Variation Deblurring and Denoising , 2013, J. Appl. Math..

[8]  R. Verhaak,et al.  Prognostically useful gene-expression profiles in acute myeloid leukemia. , 2004, The New England journal of medicine.

[9]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[10]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[11]  H. Miller,et al.  Plan and operation of the health and nutrition examination survey. United states--1971-1973. , 1973, Vital and health statistics. Ser. 1, Programs and collection procedures.

[12]  Andrea Vedaldi,et al.  Interpretable Explanations of Black Boxes by Meaningful Perturbation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[13]  Jack Xin,et al.  A Weighted Difference of Anisotropic and Isotropic Total Variation Model for Image Processing , 2015, SIAM J. Imaging Sci..

[14]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[15]  N. Simon,et al.  Sparse-Input Neural Networks for High-dimensional Nonparametric Regression and Classification , 2017, 1711.07592.

[16]  Scott M. Lundberg,et al.  Consistent Individualized Feature Attribution for Tree Ensembles , 2018, ArXiv.

[17]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[18]  Andrew Slavin Ross,et al.  Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing their Input Gradients , 2017, AAAI.

[19]  Daniel S. Himmelstein,et al.  Understanding multicellular function and disease with human tissue-specific networks , 2015, Nature Genetics.

[20]  Erik Strumbelj,et al.  Explaining prediction models and individual predictions with feature contributions , 2014, Knowledge and Information Systems.

[21]  Scott T. Rickard,et al.  Comparing Measures of Sparsity , 2008, IEEE Transactions on Information Theory.

[22]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[23]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[24]  Andreas Bender,et al.  DeepSynergy: predicting anti-cancer drug synergy with Deep Learning , 2017, Bioinform..

[25]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[26]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[27]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[28]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[29]  Raja Giryes,et al.  Improving DNN Robustness to Adversarial Attacks using Jacobian Regularization , 2018, ECCV.

[30]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[31]  Johnathan M. Bardsley,et al.  Laplace-distributed increments, the Laplace prior, and edge-preserving regularization , 2012 .

[32]  Martin Wattenberg,et al.  SmoothGrad: removing noise by adding noise , 2017, ArXiv.

[33]  M. Nikolova An Algorithm for Total Variation Minimization and Applications , 2004 .

[34]  Qian Jiang,et al.  Meis1 is critical to the maintenance of human acute myeloid leukemia cells independent of MLL rearrangements , 2017, Annals of Hematology.

[35]  Eric J. Friedman,et al.  Paths and consistency in additive cost sharing , 2004, Int. J. Game Theory.

[36]  Danilo Comminiello,et al.  Group sparse regularization for deep neural networks , 2016, Neurocomputing.

[37]  Director,et al.  Plan and operation of the health and nutrition examination survey: United States-1971-1973. , 1973, Vital and health statistics. Ser. 1, Programs and collection procedures.

[38]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[39]  Benjamin Recht,et al.  Do ImageNet Classifiers Generalize to ImageNet? , 2019, ICML.

[40]  Beth Wilmot,et al.  Functional Genomic Landscape of Acute Myeloid Leukemia , 2018, Nature.

[41]  A. Slavin The Neural LASSO : Local Linear Sparsity for Interpretable Explanations , 2018 .

[42]  G. Corrado,et al.  Using a Deep Learning Algorithm and Integrated Gradients Explanation to Assist Grading for Diabetic Retinopathy. , 2019, Ophthalmology.

[43]  Chenchen Liu,et al.  Towards Robust Training of Neural Networks by Regularizing Adversarial Gradients , 2018, ArXiv.

[44]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[45]  John D. Storey,et al.  Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis , 2007, PLoS genetics.

[46]  Gabriel Erion,et al.  Explainable AI for Trees: From Local Explanations to Global Understanding , 2019, ArXiv.

[47]  Scott M. Lundberg,et al.  Explainable machine-learning predictions for the prevention of hypoxaemia during surgery , 2018, Nature Biomedical Engineering.

[48]  Weihong Deng,et al.  Very deep convolutional neural network based image classification using small training sample size , 2015, 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR).

[49]  Yair Zick,et al.  Algorithmic Transparency via Quantitative Input Influence: Theory and Experiments with Learning Systems , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[50]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[51]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[52]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[53]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[54]  Harris Drucker,et al.  Improving generalization performance using double backpropagation , 1992, IEEE Trans. Neural Networks.

[55]  Aleksander Madry,et al.  Robustness May Be at Odds with Accuracy , 2018, ICLR.

[56]  Alexander Binder,et al.  Layer-Wise Relevance Propagation for Neural Networks with Local Renormalization Layers , 2016, ICANN.

[57]  Ashraf A. Kassim,et al.  Gini Index as Sparsity Measure for Signal Reconstruction from Compressive Samples , 2011, IEEE Journal of Selected Topics in Signal Processing.

[58]  Aleksander Madry,et al.  Adversarial Examples Are Not Bugs, They Are Features , 2019, NeurIPS.