论文信息 - The Definitions of Interpretability and Learning of Interpretable Models

The Definitions of Interpretability and Learning of Interpretable Models

As machine learning algorithms getting adopted in an ever-increasing number of applications, interpretation has emerged as a crucial desideratum. In this paper, we propose a mathematical definition for the humaninterpretable model. In particular, we define interpretability between two information process systems. If a prediction model is interpretable by a human recognition system based on the above interpretability definition, the prediction model is defined as a completely human-interpretable model. We further design a practical framework to train a completely human-interpretable model by user interactions. Experiments on image datasets show the advantages of our proposed model in two aspects: 1) The completely human-interpretable model can provide an entire decisionmaking process that is human-understandable; 2) The completely humaninterpretable model is more robust against adversarial attacks.

Changshui Zhang | Weishen Pan | Changshui Zhang | Weishen Pan

[1] David A. Forsyth,et al. SafetyNet: Detecting and Rejecting Adversarial Examples Robustly , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[2] Jacob Andreas,et al. Compositional Explanations of Neurons , 2020, NeurIPS.

[3] Aleksander Madry,et al. Robustness May Be at Odds with Accuracy , 2018, ICLR.

[4] David A. Wagner,et al. Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[5] Pieter Abbeel,et al. InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[6] Been Kim,et al. Concept Bottleneck Models , 2020, ICML.

[7] Bolei Zhou,et al. Interpretable Basis Decomposition for Visual Explanation , 2018, ECCV.

[8] Chuang Gan,et al. Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding , 2018, NeurIPS.

[9] Christopher Burgess,et al. beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[10] Desney S. Tan,et al. Overview based example selection in end user interactive concept learning , 2009, UIST '09.

[11] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[12] Martin Wattenberg,et al. Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) , 2017, ICML.

[13] James Zou,et al. Towards Automatic Concept-based Explanations , 2019, NeurIPS.

[14] Yonatan Belinkov,et al. What Is One Grain of Sand in the Desert? Analyzing Individual Neurons in Deep NLP Models , 2018, AAAI.

[15] Frédéric Jurie,et al. Semantic bottleneck for computer vision tasks , 2018, ACCV.

[16] Pin-Yu Chen,et al. Proper Network Interpretability Helps Adversarial Robustness in Classification , 2020, ICML.

[17] Aleksander Madry,et al. Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[18] Andrew Slavin Ross,et al. Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing their Input Gradients , 2017, AAAI.

[19] Nick Cammarata,et al. Zoom In: An Introduction to Circuits , 2020 .

[20] Tommi S. Jaakkola,et al. Towards Robust Interpretability with Self-Explaining Neural Networks , 2018, NeurIPS.

[21] Quentin Pleple,et al. Interactive Topic Modeling , 2013 .

[22] Pascal Vincent,et al. Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23] Kristen Grauman,et al. Interactively building a discriminative vocabulary of nameable attributes , 2011, CVPR 2011.

[24] Chun-Liang Li,et al. On Completeness-aware Concept-Based Explanations in Deep Neural Networks , 2020, NeurIPS.

[25] Geraint Rees,et al. Clinically applicable deep learning for diagnosis and referral in retinal disease , 2018, Nature Medicine.

[26] C. Rudin,et al. Concept whitening for interpretable image recognition , 2020, Nature Machine Intelligence.

[27] Bernt Schiele,et al. Interpretability Beyond Classification Output: Semantic Bottleneck Networks , 2019, ArXiv.

[28] Bolei Zhou,et al. Network Dissection: Quantifying Interpretability of Deep Visual Representations , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29] Kevin D. Seppi,et al. Labeled Anchors and a Scalable, Transparent, and Interactive Classifier , 2018, EMNLP.