SANVis: Visual Analytics for Understanding Self-Attention Networks

Attention networks, a deep neural network architecture inspired by humans’ attention mechanism, have seen significant success in image captioning, machine translation, and many other applications. Recently, they have been further evolved into an advanced approach called multi-head self-attention networks, which can encode a set of input vectors, e.g., word vectors in a sentence, into another set of vectors. Such encoding aims at simultaneously capturing diverse syntactic and semantic features within a set, each of which corresponds to a particular attention head, forming altogether multi-head attention. Meanwhile, the increased model complexity prevents users from easily understanding and manipulating the inner workings of models. To tackle the challenges, we present a visual analytics system called SANVis, which helps users understand the behaviors and the characteristics of multi-head self-attention networks. Using a state-of-the-art self-attention model called Transformer, we demonstrate usage scenarios of SANVis in machine translation tasks. Our system is available at http://short.sanvis.org.

[1]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Alexander M. Rush,et al.  Seq2seq-Vis: A Visual Debugging Tool for Sequence-to-Sequence Models , 2018, IEEE Transactions on Visualization and Computer Graphics.

[3]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[4]  Zhen Li,et al.  Towards Better Analysis of Deep Convolutional Neural Networks , 2016, IEEE Transactions on Visualization and Computer Graphics.

[5]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[6]  Fedor Moiseev,et al.  Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned , 2019, ACL.

[7]  Dongyu Liu,et al.  DeepTracker: Visualizing the Training Process of Convolutional Neural Networks , 2018, ACM Trans. Intell. Syst. Technol..

[8]  Graham Neubig,et al.  Extreme Adaptation for Personalized Neural Machine Translation , 2018, ACL.

[9]  Zhen Li,et al.  Understanding Hidden Memories of Recurrent Neural Networks , 2017, 2017 IEEE Conference on Visual Analytics Science and Technology (VAST).

[10]  Hao Yang,et al.  GANViz: A Visual Analytics Approach to Understand the Adversarial Game , 2018, IEEE Transactions on Visualization and Computer Graphics.

[11]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[12]  Omer Levy,et al.  What Does BERT Look at? An Analysis of BERT’s Attention , 2019, BlackboxNLP@ACL.

[13]  Alexander M. Rush,et al.  LSTMVis: A Tool for Visual Analysis of Hidden State Dynamics in Recurrent Neural Networks , 2016, IEEE Transactions on Visualization and Computer Graphics.

[14]  Hao Yang,et al.  DQNViz: A Visual Analytics Approach to Understand Deep Q-Networks , 2019, IEEE Transactions on Visualization and Computer Graphics.

[15]  Martin Wattenberg,et al.  GAN Lab: Understanding Complex Deep Generative Models using Interactive Visual Experimentation , 2018, IEEE Transactions on Visualization and Computer Graphics.

[16]  Alexander M. Rush,et al.  OpenNMT: Open-Source Toolkit for Neural Machine Translation , 2017, ACL.

[17]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[18]  Dylan Cashman,et al.  RNNbow: Visualizing Learning Via Backpropagation Gradients in RNNs , 2018, IEEE Computer Graphics and Applications.

[19]  Andrew McCallum,et al.  Linguistically-Informed Self-Attention for Semantic Role Labeling , 2018, EMNLP.

[20]  Jean-Daniel Fekete,et al.  Small MultiPiles: Piling Time to Explore Temporal Patterns in Dynamic Networks , 2015, Comput. Graph. Forum.

[21]  Yonatan Belinkov,et al.  Analyzing the Structure of Attention in a Transformer Language Model , 2019, BlackboxNLP@ACL.

[22]  Jimeng Sun,et al.  RetainVis: Visual Analytics with Interpretable and Interactive Recurrent Neural Networks on Electronic Medical Records , 2018, IEEE Transactions on Visualization and Computer Graphics.

[23]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[24]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[25]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[26]  Xiaoming Liu,et al.  Do Convolutional Neural Networks Learn Class Hierarchy? , 2017, IEEE Transactions on Visualization and Computer Graphics.

[27]  Jean-Daniel Fekete,et al.  NodeTrix: a Hybrid Visualization of Social Networks , 2007, IEEE Transactions on Visualization and Computer Graphics.

[28]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[29]  Elmar Eisemann,et al.  DeepEyes: Progressive Visual Analytics for Designing Deep Neural Networks , 2018, IEEE Transactions on Visualization and Computer Graphics.

[30]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.