On the Sub-Layer Functionalities of Transformer Decoder
暂无分享,去创建一个
Shuming Shi | Zhaopeng Tu | Prasad Tadepalli | Stefan Lee | Longyue Wang | Yilin Yang | Prasad Tadepalli | Zhaopeng Tu | Shuming Shi | Longyue Wang | Stefan Lee | Yilin Yang
[1] Zhaopeng Tu,et al. Assessing the Ability of Self-Attention Networks to Learn Word Order , 2019, ACL.
[2] Omer Levy,et al. Deep RNNs Encode Soft Hierarchical Syntax , 2018, ACL.
[3] Ankur Bapna,et al. The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation , 2018, ACL.
[4] Yang Liu,et al. Modeling Coverage for Neural Machine Translation , 2016, ACL.
[5] Tao Shen,et al. DiSAN: Directional Self-Attention Network for RNN/CNN-free Language Understanding , 2017, AAAI.
[6] Lemao Liu,et al. On the Word Alignment from Neural Machine Translation , 2019, ACL.
[7] Shuming Shi,et al. Exploiting Deep Representations for Neural Machine Translation , 2018, EMNLP.
[8] Andy Way,et al. Exploiting Cross-Sentence Context for Neural Machine Translation , 2017, EMNLP.
[9] Shuming Shi,et al. On the Inference Calibration of Neural Machine Translation , 2020, ACL.
[10] Joakim Nivre,et al. Encoders Help You Disambiguate Word Senses in Neural Machine Translation , 2019, EMNLP/IJCNLP.
[11] Andrew McCallum,et al. Linguistically-Informed Self-Attention for Semantic Role Labeling , 2018, EMNLP.
[12] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[13] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[14] Xing Wang,et al. Context-Aware Self-Attention Networks , 2019, AAAI.
[15] Yang Liu,et al. Contrastive Unsupervised Word Alignment with Non-Local Features , 2014, AAAI.
[16] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Shuming Shi,et al. Neural Machine Translation with Adequacy-Oriented Learning , 2018, AAAI.
[18] Joakim Nivre,et al. Understanding Neural Machine Translation by Simplification: The Case of Encoder-free Models , 2019, RANLP.
[19] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[20] Fedor Moiseev,et al. Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned , 2019, ACL.
[21] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.
[22] Jian Li,et al. Multi-Head Attention with Disagreement Regularization , 2018, EMNLP.
[23] Pascal Vincent,et al. Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..
[24] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[25] Rico Sennrich,et al. Improving Deep Transformer with Depth-Scaled Initialization and Merged Attention , 2019, EMNLP.
[26] Xing Wang,et al. One Model to Learn Both: Zero Pronoun Prediction and Translation , 2019, EMNLP.
[27] Geoffrey E. Hinton,et al. Grammar as a Foreign Language , 2014, NIPS.
[28] Yonatan Belinkov,et al. Understanding and Improving Morphological Learning in the Neural Machine Translation Decoder , 2017, IJCNLP.
[29] Jörg Tiedemann,et al. An Analysis of Encoder Representations in Transformer-Based Machine Translation , 2018, BlackboxNLP@EMNLP.
[30] Arianna Bisazza,et al. The Lazy Encoder: A Fine-Grained Analysis of the Role of Morphology in Neural Machine Translation , 2018, EMNLP.
[31] Xing Shi,et al. Does String-Based Neural MT Learn Source Syntax? , 2016, EMNLP.
[32] Byron C. Wallace,et al. Attention is not Explanation , 2019, NAACL.
[33] Xing Wang,et al. Towards Understanding Neural Machine Translation with Word Importance , 2019, EMNLP.
[34] Hermann Ney,et al. A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.
[35] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.
[36] Omer Levy,et al. Are Sixteen Heads Really Better than One? , 2019, NeurIPS.
[37] Yann Dauphin,et al. Convolutional Sequence to Sequence Learning , 2017, ICML.
[38] Guillaume Lample,et al. What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties , 2018, ACL.
[39] Alex Wang,et al. What do you learn from context? Probing for sentence structure in contextualized word representations , 2019, ICLR.
[40] Yonatan Belinkov,et al. What do Neural Machine Translation Models Learn about Morphology? , 2017, ACL.
[41] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[42] Rico Sennrich,et al. Widening the Representation Bottleneck in Neural Machine Translation with Lexical Shortcuts , 2019, WMT.
[43] Xing Wang,et al. How Does Selective Mechanism Improve Self-Attention Networks? , 2020, ACL.
[44] Dipanjan Das,et al. BERT Rediscovers the Classical NLP Pipeline , 2019, ACL.
[45] Yonatan Belinkov,et al. Evaluating Layers of Representation in Neural Machine Translation on Part-of-Speech and Semantic Tagging Tasks , 2017, IJCNLP.
[46] H. Bourlard,et al. Auto-association by multilayer perceptrons and singular value decomposition , 1988, Biological Cybernetics.