Neural Response Interpretation through the Lens of Critical Pathways

Is critical input information encoded in specific sparse pathways within the neural network? In this work, we discuss the problem of identifying these critical pathways and subsequently leverage them for interpreting the network’s response to an input. The pruning objective — selecting the smallest group of neurons for which the response remains equivalent to the original network — has been previously proposed for identifying critical pathways. We demonstrate that sparse pathways derived from pruning do not necessarily encode critical input information. To ensure sparse pathways include critical fragments of the encoded input information, we propose pathway selection via neurons’ contribution to the response. We proceed to explain how critical pathways can reveal critical input features. We prove that pathways selected via neuron contribution are locally linear (in an ℓ2-ball), a property that we use for proposing a feature attribution method: "pathway gradient". We validate our interpretation method using mainstream evaluation experiments. The validation of pathway gradient interpretation method further confirms that selected pathways using neuron contributions correspond to critical input features. The code1 2 is publicly available.

[1]  Michael Elad,et al.  Multilayer Convolutional Sparse Modeling: Pursuit and Dictionary Learning , 2017, IEEE Transactions on Signal Processing.

[2]  Bruno A Olshausen,et al.  Sparse coding of sensory inputs , 2004, Current Opinion in Neurobiology.

[3]  Michael Elad,et al.  Convolutional Neural Networks Analyzed via Convolutional Sparse Coding , 2016, J. Mach. Learn. Res..

[4]  Andrea Vedaldi,et al.  Understanding Deep Networks via Extremal Perturbations and Smooth Masks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Philip H. S. Torr,et al.  SNIP: Single-shot Network Pruning based on Connection Sensitivity , 2018, ICLR.

[6]  Nassir Navab,et al.  Rethinking Positive Aggregation and Propagation of Gradients in Gradient-based Saliency Methods , 2020, ArXiv.

[7]  Jean-Luc Marichal,et al.  Axiomatic characterizations of generalized values , 2007, Discret. Appl. Math..

[8]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[9]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[10]  Surya Ganguli,et al.  On the Expressive Power of Deep Neural Networks , 2016, ICML.

[11]  Seung Woo Lee,et al.  Birdsnap: Large-Scale Fine-Grained Visual Categorization of Birds , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Xiaolin Hu,et al.  Interpret Neural Networks by Identifying Critical Data Routing Paths , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[14]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[16]  L. Shapley A Value for n-person Games , 1988 .

[17]  Fuxin Li,et al.  Visualizing Deep Networks by Optimizing with Integrated Gradients , 2019, CVPR Workshops.

[18]  Alexander Binder,et al.  Evaluating the Visualization of What a Deep Neural Network Has Learned , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[19]  Motoaki Kawanabe,et al.  How to Explain Individual Classification Decisions , 2009, J. Mach. Learn. Res..

[20]  Dumitru Erhan,et al.  A Benchmark for Interpretability Methods in Deep Neural Networks , 2018, NeurIPS.

[21]  Alexander Binder,et al.  Explaining nonlinear classification decisions with deep Taylor decomposition , 2015, Pattern Recognit..

[22]  Leon Sixt,et al.  When Explanations Lie: Why Many Modified BP Attributions Fail , 2019, ICML.

[23]  Bolei Zhou,et al.  Revisiting the Importance of Individual Units in CNNs via Ablation , 2018, ArXiv.

[24]  Max Welling,et al.  Visualizing Deep Neural Network Decisions: Prediction Difference Analysis , 2017, ICLR.

[25]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[26]  Been Kim,et al.  Sanity Checks for Saliency Maps , 2018, NeurIPS.

[27]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[28]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[29]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[30]  Francesca Medda,et al.  A Baseline for Shapley Values in MLPs: from Missingness to Neutrality , 2021, ESANN 2021 proceedings.

[31]  Andrea Vedaldi,et al.  Interpretable Explanations of Black Boxes by Meaningful Perturbation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[32]  Cengiz Öztireli,et al.  Towards better understanding of gradient-based attribution methods for Deep Neural Networks , 2017, ICLR.

[33]  Martin Wattenberg,et al.  SmoothGrad: removing noise by adding noise , 2017, ArXiv.

[34]  Bolei Zhou,et al.  Network Dissection: Quantifying Interpretability of Deep Visual Representations , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Markus H. Gross,et al.  Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation , 2019, ICML.

[36]  Nick Cammarata,et al.  Zoom In: An Introduction to Circuits , 2020 .

[37]  Nassir Navab,et al.  Explaining Neural Networks via Perturbing Important Learned Features , 2019, ArXiv.

[38]  Bolei Zhou,et al.  Object Detectors Emerge in Deep Scene CNNs , 2014, ICLR.

[39]  Tommi S. Jaakkola,et al.  Towards Robust, Locally Linear Deep Networks , 2019, ICLR.

[40]  Thomas Brox,et al.  Synthesizing the preferred inputs for neurons in neural networks via deep generator networks , 2016, NIPS.

[41]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[42]  Matthew Botvinick,et al.  On the importance of single directions for generalization , 2018, ICLR.

[43]  Timo Aila,et al.  Pruning Convolutional Neural Networks for Resource Efficient Inference , 2016, ICLR.

[44]  Mukund Sundararajan,et al.  The many Shapley values for model explanation , 2019, ICML.

[45]  Michael C. Mozer,et al.  Skeletonization: A Technique for Trimming the Fat from a Network via Relevance Assessment , 1988, NIPS.

[46]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[47]  Razvan Pascanu,et al.  On the Number of Linear Regions of Deep Neural Networks , 2014, NIPS.

[48]  Abubakar Abid,et al.  Interpretation of Neural Networks is Fragile , 2017, AAAI.

[49]  Richard H. R. Hahnloser,et al.  On the piecewise analysis of networks of linear threshold neurons , 1998, Neural Networks.

[50]  Sven Behnke,et al.  Interpretable and Fine-Grained Visual Explanations for Convolutional Neural Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[52]  Aleksander Madry,et al.  Adversarial Examples Are Not Bugs, They Are Features , 2019, NeurIPS.

[53]  Minyi Guo,et al.  Adversarial Defense Through Network Profiling Based Path Extraction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[55]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[56]  Peter Dayan,et al.  Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems , 2001 .

[57]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[58]  Martin Wattenberg,et al.  Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) , 2017, ICML.

[59]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[60]  Pascal Vincent,et al.  Visualizing Higher-Layer Features of a Deep Network , 2009 .

[61]  Xiang Chen,et al.  Distilling Critical Paths in Convolutional Neural Networks , 2018, ArXiv.

[62]  Pascal Sturmfels,et al.  Visualizing the Impact of Feature Attribution Baselines , 2020 .

[63]  Yang Zhang,et al.  A Theoretical Explanation for Perplexing Behaviors of Backpropagation-based Visualizations , 2018, ICML.

[64]  Nassir Navab,et al.  Learning Interpretable Features via Adversarially Robust Optimization , 2019, MICCAI.

[65]  Andrea Vedaldi,et al.  Understanding deep image representations by inverting them , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[66]  Zhe L. Lin,et al.  Top-Down Neural Attention by Excitation Backprop , 2016, International Journal of Computer Vision.

[67]  Seth Flaxman,et al.  European Union Regulations on Algorithmic Decision-Making and a "Right to Explanation" , 2016, AI Mag..

[68]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[69]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[70]  Bob L. Sturm,et al.  Local Interpretable Model-Agnostic Explanations for Music Content Analysis , 2017, ISMIR.