Corpus-level and Concept-based Explanations for Interpretable Document Classification

Using attention weights to identify information that is important for models' decision making is a popular approach to interpret attention-based neural networks, which is commonly realized via creating a heat-map for every single document based on attention weights. However, this interpretation method is fragile. In this paper, we propose a corpus-level explanation approach, which aims to capture causal relationships between keywords and model predictions via learning importance of keywords for predicted labels across a training corpus based on attention weights. Using this idea as the fundamental building block, we further propose a concept-based explanation method that can automatically learn higher-level concepts and their importance to model prediction task. Our concept-based explanation method is built upon a novel Abstraction-Aggregation Network, which can automatically cluster important keywords during an end-to-end training process. We apply these methods to the document classification task and show that they are powerful in extracting semantically meaningful keywords and concepts. Our consistency analysis results based on an attention-based Naive Bayes Classifier also demonstrate these keywords and concepts are important for model predictions.

[1]  Chandan K. Reddy,et al.  LeafNATS: An Open-Source Toolkit and Live Demo System for Neural Abstractive Text Summarization , 2019, NAACL.

[2]  Byron C. Wallace,et al.  Attention is not Explanation , 2019, NAACL.

[3]  Evgueni A. Haroutunian,et al.  Information Theory and Statistics , 2011, International Encyclopedia of Statistical Science.

[4]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[5]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[6]  Li Zhao,et al.  Attention-based LSTM for Aspect-level Sentiment Classification , 2016, EMNLP.

[7]  Chandan K. Reddy,et al.  A Simple and Effective Self-Supervised Contrastive Learning Framework for Aspect Detection , 2020, AAAI.

[8]  Dumitru Erhan,et al.  The (Un)reliability of saliency methods , 2017, Explainable AI.

[9]  Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations , 2019 .

[10]  Jesse Vig,et al.  A Multiscale Visualization of Attention in the Transformer Model , 2019, ACL.

[11]  Jaegul Choo,et al.  Short-Text Topic Modeling via Non-negative Matrix Factorization Enriched with Local Word-Context Correlations , 2018, WWW.

[12]  Haris Papageorgiou,et al.  SemEval-2016 Task 5: Aspect Based Sentiment Analysis , 2016, *SEMEVAL.

[13]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[14]  Geoffrey E. Hinton,et al.  Grammar as a Foreign Language , 2014, NIPS.

[15]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[16]  Lalana Kagal,et al.  Explaining Explanations: An Overview of Interpretability of Machine Learning , 2018, 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA).

[17]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[18]  Julian J. McAuley,et al.  Ups and Downs: Modeling the Visual Evolution of Fashion Trends with One-Class Collaborative Filtering , 2016, WWW.

[19]  Alexander M. Rush,et al.  Seq2seq-Vis: A Visual Debugging Tool for Sequence-to-Sequence Models , 2018, IEEE Transactions on Visualization and Computer Graphics.

[20]  Cuntai Guan,et al.  A Survey on Explainable Artificial Intelligence (XAI): Toward Medical XAI , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[21]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[22]  Xiaoli Z. Fern,et al.  Interpreting Recurrent and Attention-Based Neural Models: a Case Study on Natural Language Inference , 2018, EMNLP.

[23]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[24]  Martin Wattenberg,et al.  Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) , 2017, ICML.

[25]  Thomas Wolf,et al.  DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[26]  Thomas Wolf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[27]  Yuval Pinter,et al.  Attention is not not Explanation , 2019, EMNLP.

[28]  Noah A. Smith,et al.  Is Attention Interpretable? , 2019, ACL.

[29]  Abubakar Abid,et al.  Interpretation of Neural Networks is Fragile , 2017, AAAI.

[30]  Jesse Vig Visualizing Attention in Transformer-Based Language models , 2019 .

[31]  Ludovic Denoyer,et al.  EDUCE: Explaining model Decisions through Unsupervised Concepts Extraction , 2019, ArXiv.

[32]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[33]  Bolei Zhou,et al.  Interpretable Basis Decomposition for Visual Explanation , 2018, ECCV.

[34]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[35]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[36]  Xuchao Zhang,et al.  Mitigating Uncertainty in Document Classification , 2019, NAACL.

[37]  Majid Komeili,et al.  Cause and Effect: Concept-based Explanation of Neural Networks , 2021, 2021 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[38]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[39]  Carlos Guestrin,et al.  Model-Agnostic Interpretability of Machine Learning , 2016, ArXiv.

[40]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[41]  Qi Tian,et al.  Social Anchor-Unit Graph Regularized Tensor Completion for Large-Scale Image Retagging , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  James Zou,et al.  Towards Automatic Concept-based Explanations , 2019, NeurIPS.

[43]  Xixian Chen,et al.  Towards Global Explanations of Convolutional Neural Networks With Concept Attribution , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Arman Cohan,et al.  Longformer: The Long-Document Transformer , 2020, ArXiv.

[45]  Boi Faltings,et al.  Rationalization through Concepts , 2021, FINDINGS.

[46]  Mor Naaman,et al.  Newsroom: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies , 2018, NAACL.

[47]  D. Pedreschi,et al.  Benchmarking and survey of explanation methods for black box models , 2021, Data Mining and Knowledge Discovery.

[48]  Arun Das,et al.  Opportunities and Challenges in Explainable Artificial Intelligence (XAI): A Survey , 2020, ArXiv.

[49]  Chun-Liang Li,et al.  On Concept-Based Explanations in Deep Neural Networks , 2019, ArXiv.

[50]  Meng Wang,et al.  Tri-Clustered Tensor Completion for Social-Aware Image Tag Refinement , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  Bowen Zhou,et al.  A Structured Self-attentive Sentence Embedding , 2017, ICLR.

[52]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[53]  Barbara Plank,et al.  Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies , 2011 .

[54]  C. Rudin,et al.  Concept whitening for interpretable image recognition , 2020, Nature Machine Intelligence.

[55]  Frederick Liu,et al.  Incorporating Priors with Feature Attribution on Text Classification , 2019, ACL.

[56]  Chandan K. Reddy,et al.  Probabilistic Topic Modeling for Comparative Analysis of Document Collections , 2020, ACM Trans. Knowl. Discov. Data.

[57]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[58]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[59]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[60]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[61]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[62]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.