The Volctrans GLAT System: Non-autoregressive Translation Meets WMT21
暂无分享,去创建一个
Zaixiang Zheng | Mingxuan Wang | Hao Zhou | Jiangtao Feng | Hao Zhou | Lei Li | Lihua Qian | Yi Zhou | Yaoming Zhu | Zehui Lin | Shanbo Cheng | Lei Li | Lihua Qian | Jiangtao Feng | Mingxuan Wang | Yaoming Zhu | Shanbo Cheng | Zaixiang Zheng | Yi Zhou | Zehui Lin
[1] Meng Sun,et al. Baidu Neural Machine Translation Systems for WMT19 , 2019, WMT.
[2] Iryna Gurevych,et al. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , 2019, EMNLP.
[3] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[4] Xiao Pan,et al. The Volctrans Machine Translation System for WMT20 , 2020, WMT@EMNLP.
[5] Yang Feng,et al. Bridging the Gap between Training and Inference for Neural Machine Translation , 2019, ACL.
[6] Victor O. K. Li,et al. Non-Autoregressive Neural Machine Translation , 2017, ICLR.
[7] Rico Sennrich,et al. Domain, Translationese and Noise in Synthetic Data for Neural Machine Translation , 2019, ArXiv.
[8] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.
[10] Jingbo Zhu,et al. The NiuTrans Machine Translation Systems for WMT19 , 2019, WMT.
[11] Enhong Chen,et al. Joint Training for Neural Machine Translation Models with Monolingual Data , 2018, AAAI.
[12] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[13] Noah A. Smith,et al. A Simple, Fast, and Effective Reparameterization of IBM Model 2 , 2013, NAACL.
[14] Jingbo Zhu,et al. Learning Deep Transformer Models for Machine Translation , 2019, ACL.
[15] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[16] Jie Zhou,et al. WeChat Neural Machine Translation Systems for WMT20 , 2020, WMT.
[17] Yu Bao,et al. Glancing Transformer for Non-Autoregressive Neural Machine Translation , 2020, ArXiv.
[18] Zaixiang Zheng,et al. Vocabulary Learning via Optimal Transport for Neural Machine Translation , 2021, ACL/IJCNLP.
[19] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[20] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..