论文信息 - Structured Self-Attention Weights Encodes Semantics in Sentiment Analysis

Structured Self-Attention Weights Encodes Semantics in Sentiment Analysis

Neural attention, especially the self-attention made popular by the Transformer, has become the workhorse of state-of-the-art natural language processing (NLP) models. Very recent work suggests that the self-attention in the Transformer encodes syntactic information; Here, we show that self-attention scores encode semantics by considering sentiment analysis tasks. In contrast to gradient-based feature attribution methods, we propose a simple and effective Layer-wise Attention Tracing (LAT) method to analyze structured attention weights. We apply our method to Transformer models trained on two tasks that have surface dissimilarities, but share common semantics---sentiment analysis of movie reviews and time-series valence prediction in life story narratives. Across both tasks, words with high aggregated attention weights were rich in emotional semantics, as quantitatively validated by an emotion lexicon labeled by human annotators. Our results show that structured attention weights encode rich semantics in sentiment analysis, and match human interpretations of semantics.

Desmond C. Ong | Zhengxuan Wu | Thanh-Son Nguyen | Zhengxuan Wu | Thanh-Son Nguyen

[1] Jamil Zaki,et al. Modeling emotion in complex stories: the Stanford Emotional Narratives Dataset , 2019, ArXiv.

[2] Geoffrey E. Hinton,et al. Grammar as a Foreign Language , 2014, NIPS.

[3] Alexander Binder,et al. On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.

[4] Bing Liu,et al. Mining and summarizing customer reviews , 2004, KDD.

[5] Dipanjan Das,et al. BERT Rediscovers the Classical NLP Pipeline , 2019, ACL.

[6] Xiaoli Z. Fern,et al. Interpreting Recurrent and Attention-Based Neural Models: a Case Study on Natural Language Inference , 2018, EMNLP.

[7] Byron C. Wallace,et al. Attention is not Explanation , 2019, NAACL.

[8] Klaus-Robert Müller,et al. Explaining Recurrent Neural Network Predictions in Sentiment Analysis , 2017, WASSA@EMNLP.

[9] Jamil Zaki,et al. Attending to Emotional Narratives , 2019, 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII).

[10] Christopher D. Manning,et al. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[11] M. Gevrey,et al. Review and comparison of methods to study the contribution of variables in artificial neural network models , 2003 .

[12] Yoav Goldberg,et al. Assessing BERT's Syntactic Abilities , 2019, ArXiv.

[13] Yang Liu,et al. On Identifiability in Transformers , 2020, ICLR.

[14] Willem Zuidema,et al. Quantifying Attention Flow in Transformers , 2020, ACL.

[15] Manaal Faruqui,et al. Attention Interpretability Across NLP Tasks , 2019, ArXiv.

[16] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[17] Omer Levy,et al. What Does BERT Look at? An Analysis of BERT’s Attention , 2019, BlackboxNLP@ACL.

[18] L. Lin,et al. A concordance correlation coefficient to evaluate reproducibility. , 1989, Biometrics.

[19] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[20] Xinlei Chen,et al. Visualizing and Understanding Neural Models in NLP , 2015, NAACL.

[21] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.