The Definitions of Interpretability and Learning of Interpretable Models

As machine learning algorithms getting adopted in an ever-increasing number of applications, interpretation has emerged as a crucial desideratum. In this paper, we propose a mathematical definition for the humaninterpretable model. In particular, we define interpretability between two information process systems. If a prediction model is interpretable by a human recognition system based on the above interpretability definition, the prediction model is defined as a completely human-interpretable model. We further design a practical framework to train a completely human-interpretable model by user interactions. Experiments on image datasets show the advantages of our proposed model in two aspects: 1) The completely human-interpretable model can provide an entire decisionmaking process that is human-understandable; 2) The completely humaninterpretable model is more robust against adversarial attacks.

[1]  David A. Forsyth,et al.  SafetyNet: Detecting and Rejecting Adversarial Examples Robustly , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[2]  Jacob Andreas,et al.  Compositional Explanations of Neurons , 2020, NeurIPS.

[3]  Aleksander Madry,et al.  Robustness May Be at Odds with Accuracy , 2018, ICLR.

[4]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[5]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[6]  Been Kim,et al.  Concept Bottleneck Models , 2020, ICML.

[7]  Bolei Zhou,et al.  Interpretable Basis Decomposition for Visual Explanation , 2018, ECCV.

[8]  Chuang Gan,et al.  Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding , 2018, NeurIPS.

[9]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[10]  Desney S. Tan,et al.  Overview based example selection in end user interactive concept learning , 2009, UIST '09.

[11]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[12]  Martin Wattenberg,et al.  Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) , 2017, ICML.

[13]  James Zou,et al.  Towards Automatic Concept-based Explanations , 2019, NeurIPS.

[14]  Yonatan Belinkov,et al.  What Is One Grain of Sand in the Desert? Analyzing Individual Neurons in Deep NLP Models , 2018, AAAI.

[15]  Frédéric Jurie,et al.  Semantic bottleneck for computer vision tasks , 2018, ACCV.

[16]  Pin-Yu Chen,et al.  Proper Network Interpretability Helps Adversarial Robustness in Classification , 2020, ICML.

[17]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[18]  Andrew Slavin Ross,et al.  Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing their Input Gradients , 2017, AAAI.

[19]  Nick Cammarata,et al.  Zoom In: An Introduction to Circuits , 2020 .

[20]  Tommi S. Jaakkola,et al.  Towards Robust Interpretability with Self-Explaining Neural Networks , 2018, NeurIPS.

[21]  Quentin Pleple,et al.  Interactive Topic Modeling , 2013 .

[22]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Kristen Grauman,et al.  Interactively building a discriminative vocabulary of nameable attributes , 2011, CVPR 2011.

[24]  Chun-Liang Li,et al.  On Completeness-aware Concept-Based Explanations in Deep Neural Networks , 2020, NeurIPS.

[25]  Geraint Rees,et al.  Clinically applicable deep learning for diagnosis and referral in retinal disease , 2018, Nature Medicine.

[26]  C. Rudin,et al.  Concept whitening for interpretable image recognition , 2020, Nature Machine Intelligence.

[27]  Bernt Schiele,et al.  Interpretability Beyond Classification Output: Semantic Bottleneck Networks , 2019, ArXiv.

[28]  Bolei Zhou,et al.  Network Dissection: Quantifying Interpretability of Deep Visual Representations , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Kevin D. Seppi,et al.  Labeled Anchors and a Scalable, Transparent, and Interactive Classifier , 2018, EMNLP.