Evaluating Feature Importance Estimates

Estimating the influence of a given feature to a model prediction is challenging. We introduce ROAR, RemOve And Retrain, a benchmark to evaluate the accuracy of interpretability methods that estimate input feature importance in deep neural networks. We remove a fraction of input features deemed to be most important according to each estimator and measure the change to the model accuracy upon retraining. The most accurate estimator will identify inputs as important whose removal causes the most damage to model performance relative to all other estimators. This evaluation produces thought-provoking results -- we find that several estimators are less accurate than a random assignment of feature importance. However, averaging a set of squared noisy estimators (a variant of a technique proposed by Smilkov et al. (2017)), leads to significant gains in accuracy for each method considered and far outperforms such a random guess.

[1]  Deborah Silver,et al.  Feature Visualization , 1994, Scientific Visualization.

[2]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[3]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[4]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Motoaki Kawanabe,et al.  How to Explain Individual Classification Decisions , 2009, J. Mach. Learn. Res..

[6]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[7]  Matthieu Guillaumin,et al.  Food-101 - Mining Discriminative Components with Random Forests , 2014, ECCV.

[8]  Seung Woo Lee,et al.  Birdsnap: Large-Scale Fine-Grained Visual Categorization of Birds , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Rich Caruana,et al.  Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[10]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[11]  Alexander Binder,et al.  On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.

[12]  Berkeley J. Dietvorst,et al.  Algorithm Aversion: People Erroneously Avoid Algorithms after Seeing Them Err , 2014, Journal of experimental psychology. General.

[13]  Bolei Zhou,et al.  Object Detectors Emerge in Deep Scene CNNs , 2014, ICLR.

[14]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[15]  Anna Shcherbina,et al.  Not Just a Black Box: Learning Important Features Through Propagating Activation Differences , 2016, ArXiv.

[16]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Kaiming He,et al.  Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.

[18]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[19]  Ramprasaath R. Selvaraju,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[20]  Max Welling,et al.  Visualizing Deep Neural Network Decisions: Prediction Difference Analysis , 2017, ICLR.

[21]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[22]  Alexander Binder,et al.  Evaluating the Visualization of What a Deep Neural Network Has Learned , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[23]  Zhe L. Lin,et al.  Top-Down Neural Attention by Excitation Backprop , 2016, International Journal of Computer Vision.

[24]  Andrea Vedaldi,et al.  Interpretable Explanations of Black Boxes by Meaningful Perturbation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[25]  Andrew Slavin Ross,et al.  Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations , 2017, IJCAI.

[26]  Yarin Gal,et al.  Real Time Image Saliency for Black Box Classifiers , 2017, NIPS.

[27]  Klaus-Robert Müller,et al.  PatternNet and PatternLRP - Improving the interpretability of neural networks , 2017, ArXiv.

[28]  Jascha Sohl-Dickstein,et al.  SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability , 2017, NIPS.

[29]  Geoffrey E. Hinton,et al.  Distilling a Neural Network Into a Soft Decision Tree , 2017, CEx@AI*IA.

[30]  Alexander Binder,et al.  Explaining nonlinear classification decisions with deep Taylor decomposition , 2015, Pattern Recognit..

[31]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[32]  Martin Wattenberg,et al.  SmoothGrad: removing noise by adding noise , 2017, ArXiv.

[33]  Mike Wu,et al.  Beyond Sparsity: Tree Regularization of Deep Models for Interpretability , 2017, AAAI.

[34]  Klaus-Robert Müller,et al.  Learning how to explain neural networks: PatternNet and PatternAttribution , 2017, ICLR.

[35]  Matthew Botvinick,et al.  On the importance of single directions for generalization , 2018, ICLR.

[36]  Martin Wattenberg,et al.  Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) , 2017, ICML.

[37]  Been Kim,et al.  Local Explanation Methods for Deep Neural Networks Lack Sensitivity to Parameter Values , 2018, ICLR.

[38]  Been Kim,et al.  Sanity Checks for Saliency Maps , 2018, NeurIPS.

[39]  Samuel J. Gershman,et al.  Human-in-the-Loop Interpretability Prior , 2018, NeurIPS.

[40]  Andrew Slavin Ross,et al.  Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing their Input Gradients , 2017, AAAI.

[41]  Cengiz Öztireli,et al.  Towards better understanding of gradient-based attribution methods for Deep Neural Networks , 2017, ICLR.

[42]  Bolei Zhou,et al.  Revisiting the Importance of Individual Units in CNNs via Ablation , 2018, ArXiv.

[43]  Dumitru Erhan,et al.  The (Un)reliability of saliency methods , 2017, Explainable AI.

[44]  Quoc V. Le,et al.  Do Better ImageNet Models Transfer Better? , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Daniel G. Goldstein,et al.  Manipulating and Measuring Model Interpretability , 2018, CHI.