论文信息 - A Multiscale Visualization of Attention in the Transformer Model

A Multiscale Visualization of Attention in the Transformer Model

The Transformer is a sequence model that forgoes traditional recurrent architectures in favor of a fully attention-based approach. Besides improving performance, an advantage of using attention is that it can also help to interpret a model by showing how the model assigns weight to different input elements. However, the multi-layer, multi-head attention mechanism in the Transformer model can be difficult to decipher. To make the model more accessible, we introduce an open-source tool that visualizes attention at multiple scales, each of which provides a unique perspective on the attention mechanism. We demonstrate the tool on BERT and OpenAI GPT-2 and present three example use cases: detecting model bias, locating relevant attention heads, and linking neurons to model behavior.

Jesse Vig | Jesse Vig

[1] Jun-Seok Kim,et al. Interactive Visualization and Manipulation of Attention-based Neural Machine Translation , 2017, EMNLP.

[2] Samy Bengio,et al. Tensor2Tensor for Neural Machine Translation , 2018, AMTA.

[3] Yonatan Belinkov,et al. Analyzing the Structure of Attention in a Transformer Language Model , 2019, BlackboxNLP@ACL.

[4] Phil Blunsom,et al. Reasoning about Entailment with Neural Attention , 2015, ICLR.

[5] Byron C. Wallace,et al. Attention is not Explanation , 2019, NAACL.

[6] Jason Weston,et al. A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.

[7] Alexander M. Rush,et al. Seq2seq-Vis: A Visual Debugging Tool for Sequence-to-Sequence Models , 2018, IEEE Transactions on Visualization and Computer Graphics.

[8] Tao Li,et al. Visual Interrogation of Attention-Based Models for Natural Language Inference and Machine Comprehension , 2018, EMNLP.

[9] Yonatan Belinkov,et al. Analysis Methods in Neural Language Processing: A Survey , 2018, TACL.

[10] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[11] Jörg Tiedemann,et al. An Analysis of Encoder Representations in Transformer-Based Machine Translation , 2018, BlackboxNLP@EMNLP.