论文信息 - Interpreting Undesirable Pixels for Image Classification on Black-Box Models

Interpreting Undesirable Pixels for Image Classification on Black-Box Models

In an effort to interpret black-box models, researches for developing explanation methods have proceeded in recent years. Most studies have tried to identify input pixels that are crucial to the prediction of a classifier. While this approach is meaningful to analyse the characteristic of black-box models, it is also important to investigate pixels that interfere with the prediction. To tackle this issue, in this paper, we propose an explanation method that visualizes undesirable regions to classify an image as a target class. To be specific, we divide the concept of undesirable regions into two terms: (1) factors for a target class, which hinder that black-box models identify intrinsic characteristics of a target class and (2) factors for non-target classes that are important regions for an image to be classified as other classes. We visualize such undesirable regions on heatmaps to qualitatively validate the proposed method. Furthermore, we present an evaluation metric to provide quantitative results on ImageNet.

Seong-Whan Lee | Hong-Gyu Jung | Sin-Han Kang

[1] Andrea Vedaldi,et al. Interpretable Explanations of Black Boxes by Meaningful Perturbation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[2] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Alexander Binder,et al. Layer-Wise Relevance Propagation for Neural Networks with Local Renormalization Layers , 2016, ICANN.

[5] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Bolei Zhou,et al. Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[7] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[9] Carlos Guestrin,et al. "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[10] Amit Dhurandhar,et al. Explanations based on the Missing: Towards Contrastive Explanations with Pertinent Negatives , 2018, NeurIPS.

[11] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[12] Abhishek Das,et al. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[13] Thomas Brox,et al. Synthesizing the preferred inputs for neurons in neural networks via deep generator networks , 2016, NIPS.

[14] Franco Turini,et al. Local Rule-Based Explanations of Black Box Decision Systems , 2018, ArXiv.

[15] Scott Lundberg,et al. A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[16] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[17] Alexander Binder,et al. Explaining nonlinear classification decisions with deep Taylor decomposition , 2015, Pattern Recognit..

[18] Sameer Singh,et al. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier , 2016, NAACL.

[19] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[20] Ankur Taly,et al. Axiomatic Attribution for Deep Networks , 2017, ICML.