Captum: A unified and generic model interpretability library for PyTorch

In this paper we introduce a novel, unified, open-source model interpretability library for PyTorch [12]. The library contains generic implementations of a number of gradient and perturbation-based attribution algorithms, also known as feature, neuron and layer importance algorithms, as well as a set of evaluation metrics for these algorithms. It can be used for both classification and non-classification models including graph-structured models built on Neural Networks (NN). In this paper we give a high-level overview of supported attribution algorithms and show how to perform memory-efficient and scalable computations. We emphasize that the three main characteristics of the library are multimodality, extensibility and ease of use. Multimodality supports different modality of inputs such as image, text, audio or video. Extensibility allows adding new algorithms and features. The library is also designed for easy understanding and use. Besides, we also introduce an interactive visualization tool called Captum Insights that is built on top of Captum library and allows sample-based model debugging and visualization using feature importance metrics.

[1]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[2]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[3]  Pascal Sturmfels,et al.  Learning Explainable Models Using Attribution Priors , 2019, ArXiv.

[4]  Mukund Sundararajan,et al.  How Important Is a Neuron? , 2018, ICLR.

[5]  Chih-Kuan Yeh,et al.  On the (In)fidelity and Sensitivity for Explanations. , 2019, 1901.09392.

[6]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[7]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[8]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[9]  Lalana Kagal,et al.  Explaining Explanations: An Overview of Interpretability of Machine Learning , 2018, 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA).

[10]  Tommi S. Jaakkola,et al.  On the Robustness of Interpretability Methods , 2018, ArXiv.

[11]  Been Kim,et al.  Sanity Checks for Saliency Maps , 2018, NeurIPS.

[12]  D. Rubinfeld,et al.  Hedonic housing prices and the demand for clean air , 1978 .

[13]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[14]  Been Kim,et al.  Benchmarking Attribution Methods with Relative Feature Importance , 2019 .

[15]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[16]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[17]  Cengiz Öztireli,et al.  Towards better understanding of gradient-based attribution methods for Deep Neural Networks , 2017, ICLR.

[18]  Martin Wattenberg,et al.  SmoothGrad: removing noise by adding noise , 2017, ArXiv.

[19]  Rich Caruana,et al.  InterpretML: A Unified Framework for Machine Learning Interpretability , 2019, ArXiv.