Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned
暂无分享,去创建一个
Fedor Moiseev | Rico Sennrich | Ivan Titov | David Talbot | Elena Voita | Rico Sennrich | Ivan Titov | David Talbot | Elena Voita | F. Moiseev
[1] Yang Liu,et al. Visualizing and Understanding Neural Machine Translation , 2017, ACL.
[2] Erich Elsen,et al. The State of Sparsity in Deep Neural Networks , 2019, ArXiv.
[3] Rico Sennrich,et al. Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures , 2018, EMNLP.
[4] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[5] Rico Sennrich,et al. Context-Aware Neural Machine Translation Learns Anaphora Resolution , 2018, ACL.
[6] Arianna Bisazza,et al. The Lazy Encoder: A Fine-Grained Analysis of the Role of Morphology in Neural Machine Translation , 2018, EMNLP.
[7] Ondrej Bojar,et al. Training Tips for the Transformer Model , 2018, Prague Bull. Math. Linguistics.
[8] Yonatan Belinkov,et al. Understanding and Improving Morphological Learning in the Neural Machine Translation Decoder , 2017, IJCNLP.
[9] Jörg Tiedemann,et al. An Analysis of Encoder Representations in Transformer-Based Machine Translation , 2018, BlackboxNLP@EMNLP.
[10] Jörg Tiedemann,et al. OpenSubtitles2018: Statistical Rescoring of Sentence Alignments in Large, Noisy Parallel Corpora , 2018, LREC.
[11] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.
[12] Yee Whye Teh,et al. The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.
[13] Yonatan Belinkov,et al. What do Neural Machine Translation Models Learn about Morphology? , 2017, ACL.
[14] Philipp Koehn,et al. Findings of the 2018 Conference on Machine Translation (WMT18) , 2018, WMT.
[15] Alexander Binder,et al. On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.
[16] Yonatan Belinkov,et al. Identifying and Controlling Important Neurons in Neural Machine Translation , 2018, ICLR.
[17] Yonatan Belinkov,et al. Evaluating Layers of Representation in Neural Machine Translation on Part-of-Speech and Semantic Tagging Tasks , 2017, IJCNLP.
[18] Mauro Cettolo,et al. The IWSLT 2018 Evaluation Campaign , 2018, IWSLT.
[19] Joakim Nivre,et al. An Analysis of Attention Mechanisms: The Case of Word Sense Disambiguation in Neural Machine Translation , 2018, WMT.
[20] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.
[21] Emmanuel Dupoux,et al. Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies , 2016, TACL.
[22] Mihai Surdeanu,et al. The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.
[23] Edouard Grave,et al. Colorless Green Recurrent Networks Dream Hierarchically , 2018, NAACL.
[24] Suyog Gupta,et al. To prune, or not to prune: exploring the efficacy of pruning for model compression , 2017, ICLR.
[25] Christopher D. Manning,et al. Compression of Neural Machine Translation Models via Pruning , 2016, CoNLL.
[26] Rico Sennrich,et al. How Grammatical is Character-level Neural Machine Translation? Assessing MT Quality with Contrastive Translation Pairs , 2016, EACL.
[27] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[28] Max Welling,et al. Learning Sparse Neural Networks through L0 Regularization , 2017, ICLR.
[29] Ben Poole,et al. Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.
[30] Yoshua Bengio,et al. The representational geometry of word meanings acquired by neural machine translation models , 2017, Machine Translation.
[31] Christof Monz,et al. What does Attention in Neural Machine Translation Pay Attention to? , 2017, IJCNLP.
[32] Xing Shi,et al. Does String-Based Neural MT Learn Source Syntax? , 2016, EMNLP.
[33] Christof Monz,et al. The Importance of Being Recurrent for Modeling Hierarchical Structure , 2018, EMNLP.
[34] Samy Bengio,et al. Discrete Autoencoders for Sequence Models , 2018, ArXiv.