Tencent AI Lab Machine Translation Systems for the WMT21 Biomedical Translation Task

This paper describes the Tencent AI Lab submission of the WMT2021 shared task on biomedical translation in eight language directions: English-German, English-French, English-Spanish and English-Russian. We utilized different Transformer architectures, pretraining and back-translation strategies to improve translation quality. Concretely, we explore mBART (Liu et al., 2020) to demonstrate the effectiveness of the pretraining strategy. Our submissions (Tencent AI Lab Machine Translation, TMT) in German/French/Spanish⇒English are ranked 1st respectively according to the official evaluation results in terms of BLEU scores.

[1]  Lemao Liu,et al.  TranSmart: A Practical Interactive Machine Translation System , 2021, ArXiv.

[2]  Marc'Aurelio Ranzato,et al.  Analyzing Uncertainty in Neural Machine Translation , 2018, ICML.

[3]  Maite Oronoz,et al.  Findings of the WMT 2020 Biomedical Translation Shared Task: Basque, Italian and Russian as New Additional Languages , 2020, WMT.

[4]  Shuming Shi,et al.  Exploiting deep representations for natural language processing , 2020, Neurocomputing.

[5]  Zhaopeng Tu,et al.  Understanding and Improving Lexical Choice in Non-Autoregressive Translation , 2020, ICLR.

[6]  Shilin He,et al.  Multi-Task Learning with Shared Encoder for Non-Autoregressive Machine Translation , 2020, NAACL.

[7]  Shilin He,et al.  Assessing the Bilingual Knowledge Learned by Neural Machine Translation Models , 2020, ArXiv.

[8]  Shuming Shi,et al.  Exploiting Deep Representations for Neural Machine Translation , 2018, EMNLP.

[9]  Xing Wang,et al.  Towards Better Modeling Hierarchical Structure for Self-Attention with Ordered Neurons , 2019, EMNLP/IJCNLP.

[10]  Shilin He,et al.  Data Rejuvenation: Exploiting Inactive Training Examples for Neural Machine Translation , 2020, EMNLP.

[11]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12]  Jingbo Zhu,et al.  The NiuTrans Machine Translation Systems for WMT19 , 2019, WMT.

[13]  Marjan Ghazvininejad,et al.  Multilingual Denoising Pre-training for Neural Machine Translation , 2020, Transactions of the Association for Computational Linguistics.

[14]  Huda Khayrallah,et al.  On the Impact of Various Types of Noise on Neural Machine Translation , 2018, NMT@ACL.

[15]  Shuming Shi,et al.  Tencent AI Lab Machine Translation Systems for the WMT20 Biomedical Translation Task , 2020, WMT.

[16]  Shuming Shi,et al.  Tencent Neural Machine Translation Systems for the WMT20 News Translation Task , 2020, WMT.

[17]  Taku Kudo,et al.  SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.

[18]  Jiajun Zhang,et al.  Improving Autoregressive NMT with Non-Autoregressive Model , 2020, AUTOSIMTRANS.

[19]  Zhaopeng Tu,et al.  Tencent Translation System for the WMT21 News Translation Task , 2021, WMT.

[20]  K. Bretonnel Cohen,et al.  Findings of the WMT 2019 Biomedical Translation Shared Task: Evaluation for MEDLINE Abstracts and Biomedical Terminologies , 2019, WMT.

[21]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[22]  Shuming Shi,et al.  Tencent AI Lab Machine Translation Systems for WMT20 Chat Translation Task , 2020, WMT.