i-Algebra: Towards Interactive Interpretability of Deep Neural Networks

Providing explanations for deep neural networks (DNNs) is essential for their use in domains wherein the interpretability of decisions is a critical prerequisite. Despite the plethora of work on interpreting DNNs, most existing solutions offer interpretability in an ad hoc, one-shot, and static manner, without accounting for the perception, understanding, or response of end-users, resulting in their poor usability in practice. In this paper, we argue that DNN interpretability should be implemented as the interactions between users and models. We present i-Algebra, a first-of-its-kind interactive framework for interpreting DNNs. At its core is a library of atomic, composable operators, which explain model behaviors at varying input granularity, during different inference stages, and from distinct interpretation perspectives. Leveraging a declarative query language, users are enabled to build various analysis tools (e.g.,"drill-down","comparative","what-if"analysis) via flexibly composing such operators. We prototype i-Algebra and conduct user studies in a set of representative analysis tasks, including inspecting adversarial inputs, resolving model inconsistency, and cleansing contaminated data, all demonstrating its promising usability.

[1]  Yarin Gal,et al.  Real Time Image Saliency for Black Box Classifiers , 2017, NIPS.

[2]  Ting Wang,et al.  DEEPSEC: A Uniform Platform for Security Analysis of Deep Learning Model , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[3]  Xiangyu Zhang,et al.  Attacks Meet Interpretability: Attribute-steered Detection of Adversarial Samples , 2018, NeurIPS.

[4]  Kenney Ng,et al.  Interacting with Predictions: Visual Inspection of Black-box Machine Learning Models , 2016, CHI.

[5]  Markus H. Gross,et al.  Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation , 2019, ICML.

[6]  Fei-Fei Li,et al.  Visualizing and Understanding Recurrent Networks , 2015, ArXiv.

[7]  Dan Boneh,et al.  Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[8]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[9]  Jerry Li,et al.  Spectral Signatures in Backdoor Attacks , 2018, NeurIPS.

[10]  Andrea Vedaldi,et al.  Interpretable Explanations of Black Boxes by Meaningful Perturbation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  Quanshi Zhang,et al.  Interpretable Convolutional Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[13]  Ting Wang,et al.  Interpretable Deep Learning under Fire , 2018, USENIX Security Symposium.

[14]  Enrico Bertini,et al.  Using Visual Analytics to Interpret Predictive Machine Learning Models , 2016, ArXiv.

[15]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[16]  Le Song,et al.  L-Shapley and C-Shapley: Efficient Model Interpretation for Structured Data , 2018, ICLR.

[17]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[18]  Eric Horvitz,et al.  Towards Accountable AI: Hybrid Human-Machine Analyses for Characterizing System Failure , 2018, HCOMP.

[19]  Brendan Dolan-Gavitt,et al.  BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain , 2017, ArXiv.

[20]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[21]  Gang Wang,et al.  LEMNA: Explaining Deep Learning based Security Applications , 2018, CCS.

[22]  Tudor Dumitras,et al.  Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks , 2018, NeurIPS.

[23]  Jeffrey Heer,et al.  Errudite: Scalable, Reproducible, and Testable Error Analysis , 2019, ACL.