An Efficient Transformer Decoder with Compressed Sub-layers
暂无分享,去创建一个
Jingbo Zhu | Tong Xiao | Yanyang Li | Ye Lin
[1] Jingbo Zhu,et al. Learning Deep Transformer Models for Machine Translation , 2019, ACL.
[2] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.
[3] Liwei Wang,et al. On Layer Normalization in the Transformer Architecture , 2020, ICML.
[4] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[5] Edouard Grave,et al. Reducing Transformer Depth on Demand with Structured Dropout , 2019, ICLR.
[6] Kilian Q. Weinberger,et al. Deep Networks with Stochastic Depth , 2016, ECCV.
[7] Jingbo Zhu,et al. Neural Machine Translation with Joint Representation , 2020, AAAI.
[8] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[10] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.
[11] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.
[12] Jingbo Zhu,et al. Sharing Attention Weights for Fast Transformer , 2019, IJCAI.
[13] Hao Wu,et al. Mixed Precision Training , 2017, ICLR.
[14] Ankur Bapna,et al. The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation , 2018, ACL.
[15] Jingbo Zhu,et al. Towards Fully 8-bit Integer Inference for the Transformer Model , 2020, IJCAI.
[16] Noah A. Smith,et al. Deep Encoder, Shallow Decoder: Reevaluating the Speed-Quality Tradeoff in Machine Translation , 2020, ArXiv.
[17] Deyi Xiong,et al. Accelerating Neural Transformer via an Average Attention Network , 2018, ACL.
[18] Rico Sennrich,et al. Improving Deep Transformer with Depth-Scaled Initialization and Merged Attention , 2019, EMNLP.
[19] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.
[20] Jingbo Zhu,et al. Shallow-to-Deep Training for Neural Machine Translation , 2020, EMNLP.
[21] Di He,et al. Layer-Wise Coordination between Encoder and Decoder for Neural Machine Translation , 2018, NeurIPS.
[22] Alexander M. Rush,et al. Sequence-Level Knowledge Distillation , 2016, EMNLP.
[23] Alexei Baevski,et al. Adaptive Input Representations for Neural Language Modeling , 2018, ICLR.
[24] Ankur Bapna,et al. Training Deeper Neural Machine Translation Models with Transparent Attention , 2018, EMNLP.
[25] Orhan Firat,et al. Massively Multilingual Neural Machine Translation , 2019, NAACL.
[26] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[27] Lukasz Kaiser,et al. Reformer: The Efficient Transformer , 2020, ICLR.
[28] Victor O. K. Li,et al. Non-Autoregressive Neural Machine Translation , 2017, ICLR.
[29] Graham Neubig,et al. Findings of the Second Workshop on Neural Machine Translation and Generation , 2018, NMT@ACL.
[30] Jingbo Zhu,et al. The NiuTrans Machine Translation Systems for WMT20 , 2021, WMT.
[31] Lukasz Kaiser,et al. Universal Transformers , 2018, ICLR.