Robust Models Are More Interpretable Because Attributions Look Normal
暂无分享,去创建一个
[1] Jaewook Lee,et al. Lipschitz-Certifiable Training with a Tight Outer Bound , 2020, NeurIPS.
[2] Dongxiao Zhu,et al. Explaining Deep Neural Network Models with Adversarial Gradient Integration , 2021, IJCAI.
[3] Scott Lundberg,et al. A Unified Approach to Interpreting Model Predictions , 2017, NIPS.
[4] Alexander Binder,et al. Evaluating the Visualization of What a Deep Neural Network Has Learned , 2015, IEEE Transactions on Neural Networks and Learning Systems.
[5] Corina S. Pasareanu,et al. Fast Geometric Projections for Local Robustness Certification , 2020, ICLR.
[6] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[7] Carola-Bibiane Schönlieb,et al. On the Connection Between Adversarial Robustness and Saliency Map Interpretability , 2019, ICML.
[8] W. Brendel,et al. Foolbox: A Python toolbox to benchmark the robustness of machine learning models , 2017 .
[9] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.
[10] Martin Wattenberg,et al. Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) , 2017, ICML.
[11] Alexandros G. Dimakis,et al. Provable Certificates for Adversarial Examples: Fitting a Ball in the Union of Polytopes , 2019, NeurIPS.
[12] Aleksander Madry,et al. Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.
[13] Abhishek Das,et al. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[14] Andrea Vedaldi,et al. Interpretable Explanations of Black Boxes by Meaningful Perturbation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[15] Ziyan Wu,et al. Counterfactual Visual Explanations , 2019, ICML.
[16] Klaus-Robert Müller,et al. Explanations can be manipulated and geometry is to blame , 2019, NeurIPS.
[17] Matt Fredrikson,et al. Globally-Robust Neural Networks , 2021, ICML.
[18] Matthias Bethge,et al. Foolbox Native: Fast adversarial attacks to benchmark the robustness of machine learning models in PyTorch, TensorFlow, and JAX , 2020, J. Open Source Softw..
[19] Aleksander Madry,et al. Adversarial Examples Are Not Bugs, They Are Features , 2019, NeurIPS.
[20] J. Zico Kolter,et al. Provable defenses against adversarial examples via the convex outer adversarial polytope , 2017, ICML.
[21] Alexander Binder,et al. Explaining nonlinear classification decisions with deep Taylor decomposition , 2015, Pattern Recognit..
[22] Karla Diaz-Ordaz,et al. On Locality of Local Explanation Models , 2021, NeurIPS.
[23] Matt Fredrikson,et al. Influence-Directed Explanations for Deep Convolutional Networks , 2018, 2018 IEEE International Test Conference (ITC).