论文信息 - Gradients of Counterfactuals

Gradients of Counterfactuals

Gradients have been used to quantify feature importance in machine learning models. Unfortunately, in nonlinear deep networks, not only individual neurons but also the whole network can saturate, and as a result an important input feature can have a tiny gradient. We study various networks, and observe that this phenomena is indeed widespread, across many inputs. We propose to examine interior gradients, which are gradients of counterfactual inputs constructed by scaling down the original input. We apply our method to the GoogleNet architecture for object recognition in images, as well as a ligand-based virtual screening network with categorical features and an LSTM based language model for the Penn Treebank dataset. We visualize how interior gradients better capture feature importance. Furthermore, interior gradients are applicable to a wide variety of deep networks, and have the attribution property that the feature importance scores sum to the the prediction score. Best of all, interior gradients can be computed just as easily as gradients. In contrast, previous methods are complex to implement, which hinders practical adoption.

[1] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[2] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[3] Thomas Brox,et al. Inverting Visual Representations with Convolutional Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Huan Liu,et al. Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[5] Andrea Vedaldi,et al. Understanding deep image representations by inverting them , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Pascal Vincent,et al. Visualizing Higher-Layer Features of a Deep Network , 2009 .

[7] Hod Lipson,et al. Understanding Neural Networks Through Deep Visualization , 2015, ArXiv.

[8] Motoaki Kawanabe,et al. How to Explain Individual Classification Decisions , 2009, J. Mach. Learn. Res..

[9] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[10] Carlos Guestrin,et al. "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[11] Vijay S. Pande,et al. Molecular graph convolutions: moving beyond fingerprints , 2016, Journal of Computer-Aided Molecular Design.

[12] Louis J. Billera,et al. Allocation of Shared Costs: A Set of Axioms Yielding A Unique Procedure , 1982, Math. Oper. Res..

[13] Anna Shcherbina,et al. Not Just a Black Box: Learning Important Features Through Propagating Activation Differences , 2016, ArXiv.

[14] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[15] Thomas Brox,et al. Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[16] Alexander Binder,et al. Evaluating the Visualization of What a Deep Neural Network Has Learned , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[17] Yi Sun,et al. Axiomatic attribution for multilinear functions , 2011, EC '11.

[18] L. Shapley,et al. Values of Non-Atomic Games , 1974 .

[19] Carlos Guestrin,et al. Model-Agnostic Interpretability of Machine Learning , 2016, ArXiv.

[20] Marc'Aurelio Ranzato,et al. Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[21] Xinyun Chen. Under Review as a Conference Paper at Iclr 2017 Delving into Transferable Adversarial Ex- Amples and Black-box Attacks , 2016 .

[22] Andrew Zisserman,et al. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[23] Wojciech Zaremba,et al. Recurrent Neural Network Regularization , 2014, ArXiv.

[24] Alexander Binder,et al. Layer-Wise Relevance Propagation for Neural Networks with Local Renormalization Layers , 2016, ICANN.