EDUCE: Explaining model Decisions through Unsupervised Concepts Extraction

Providing explanations along with predictions is crucial in some text processing tasks. Therefore, we propose a new self-interpretable model that performs output prediction and simultaneously provides an explanation in terms of the presence of particular concepts in the input. To do so, our model's prediction relies solely on a low-dimensional binary representation of the input, where each feature denotes the presence or absence of concepts. The presence of a concept is decided from an excerpt i.e. a small sequence of consecutive words in the text. Relevant concepts for the prediction task at hand are automatically defined by our model, avoiding the need for concept-level annotations. To ease interpretability, we enforce that for each concept, the corresponding excerpts share similar semantics and are differentiable from each others. We experimentally demonstrate the relevance of our approach on text classification and multi-sentiment analysis tasks.

[1]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[2]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[3]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[4]  N. V. Vinodchandran,et al.  Interpretable Classification via Supervised Variational Autoencoders and Differentiable Decision Trees , 2018 .

[5]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[6]  Igor Mordatch,et al.  Concept Learning with Energy-Based Models , 2018, ICLR.

[7]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[8]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[9]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[10]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[11]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[12]  Tommi S. Jaakkola,et al.  A causal framework for explaining the predictions of black-box sequence-to-sequence models , 2017, EMNLP.

[13]  Zeynep Akata,et al.  XOC: Explainable Observer-Classifier for Explainable Binary Decisions , 2019, ArXiv.

[14]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[15]  Regina Barzilay,et al.  Rationalizing Neural Predictions , 2016, EMNLP.

[16]  Cynthia Rudin,et al.  Deep Learning for Case-based Reasoning through Prototypes: A Neural Network that Explains its Predictions , 2017, AAAI.

[17]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[18]  Alexander Binder,et al.  On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.

[19]  Tommi S. Jaakkola,et al.  Towards Robust Interpretability with Self-Explaining Neural Networks , 2018, NeurIPS.

[20]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[22]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[23]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[24]  Jackie Chi Kit Cheung,et al.  Clustering-Oriented Representation Learning with Attractive-Repulsive Loss , 2018, ArXiv.

[25]  Jens Lehmann,et al.  DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.

[26]  Tommi S. Jaakkola,et al.  Learning Corresponded Rationales for Text Matching , 2018 .

[27]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[28]  Martin Wattenberg,et al.  Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) , 2017, ICML.

[29]  Yoshua Bengio,et al.  Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[30]  Alexander J. Smola,et al.  Latent LSTM Allocation: Joint Clustering and Non-Linear Dynamic Modeling of Sequence Data , 2017, ICML.

[31]  Ziyan Wu,et al.  Counterfactual Visual Explanations , 2019, ICML.

[32]  Xiang Zhang,et al.  Text Understanding from Scratch , 2015, ArXiv.

[33]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[34]  Jure Leskovec,et al.  Learning Attitudes and Attributes from Multi-aspect Reviews , 2012, 2012 IEEE 12th International Conference on Data Mining.

[35]  Prakhar Gupta,et al.  Learning Word Vectors for 157 Languages , 2018, LREC.

[36]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[37]  Ivan Titov,et al.  Interpretable Neural Predictions with Differentiable Binary Variables , 2019, ACL.