论文信息 - Scaling Symbolic Methods using Gradients for Neural Model Explanation

Scaling Symbolic Methods using Gradients for Neural Model Explanation

Symbolic techniques based on Satisfiability Modulo Theory (SMT) solvers have been proposed for analyzing and verifying neural network properties, but their usage has been fairly limited owing to their poor scalability with larger networks. In this work, we propose a technique for combining gradient-based methods with symbolic techniques to scale such analyses and demonstrate its application for model explanation. In particular, we apply this technique to identify minimal regions in an input that are most relevant for a neural network's prediction. Our approach uses gradient information (based on Integrated Gradients) to focus on a subset of neurons in the first layer, which allows our technique to scale to large networks. The corresponding SMT constraints encode the minimal input mask discovery problem such that after masking the input, the activations of the selected neurons are still above a threshold. After solving for the minimal masks, our approach scores the mask regions to generate a relative ordering of the features within the mask. This produces a saliency map which explains "where a model is looking" when making a prediction. We evaluate our technique on three datasets - MNIST, ImageNet, and Beer Reviews, and demonstrate both quantitatively and qualitatively that the regions generated by our approach are sparser and achieve higher saliency scores compared to the gradient-based methods alone.

[1] Mukund Sundararajan,et al. Attribution in Scale and Space , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Jure Leskovec,et al. Learning Attitudes and Attributes from Multi-aspect Reviews , 2012, 2012 IEEE 12th International Conference on Data Mining.

[3] Nikolaj Bjørner,et al. νZ - An Optimizing SMT Solver , 2015, TACAS.

[4] Gitta Kutyniok,et al. A Rate-Distortion Framework for Explaining Neural Network Decisions , 2019, ArXiv.

[5] Yarin Gal,et al. Real Time Image Saliency for Black Box Classifiers , 2017, NIPS.

[6] Jonas Mueller,et al. What made you do this? Understanding black-box decisions with sufficient input subsets , 2018, AISTATS.

[7] Sumit Gulwani,et al. VS3: SMT Solvers for Program Verification , 2009, CAV.

[8] Dumitru Erhan,et al. A Benchmark for Interpretability Methods in Deep Neural Networks , 2018, NeurIPS.

[9] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Sameer Singh,et al. Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods , 2020, AIES.

[11] Mykel J. Kochenderfer,et al. Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks , 2017, CAV.

[12] Martin Wattenberg,et al. SmoothGrad: removing noise by adding noise , 2017, ArXiv.

[13] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[14] Subhashini Venugopalan,et al. Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs. , 2016, JAMA.

[15] Cesare Tinelli,et al. Satisfiability Modulo Theories , 2021, Handbook of Satisfiability.

[16] Min Wu,et al. Safety Verification of Deep Neural Networks , 2016, CAV.

[17] Pascal Sturmfels,et al. Visualizing the Impact of Feature Attribution Baselines , 2020 .

[18] Joao Marques-Silva,et al. Abduction-Based Explanations for Machine Learning Models , 2018, AAAI.

[19] Sebastian Thrun,et al. Dermatologist-level classification of skin cancer with deep neural networks , 2017, Nature.

[20] Ankur Taly,et al. Axiomatic Attribution for Deep Networks , 2017, ICML.

[21] Tolga Bolukbasi,et al. XRAI: Better Attributions Through Regions , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[22] Rishabh Singh,et al. Interpreting Neural Network Judgments via Minimal, Stable, and Symbolic Corrections , 2018, NeurIPS.

[23] Tommi S. Jaakkola,et al. On the Robustness of Interpretability Methods , 2018, ArXiv.

[24] C. Pasareanu,et al. Property Inference for Deep Neural Networks , 2019, 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[25] Abhishek Das,et al. Grad-CAM: Why did you say that? , 2016, ArXiv.

[26] Alexander Binder,et al. On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.