The AISP-SJTU Translation System for WMT 2022

This paper describes AISP-SJTU’s participation in WMT 2022 shared general MT task. In this shared task, we participated in four translation directions: English-Chinese, Chinese-English, English-Japanese and Japanese-English. Our systems are based on the Transformer architecture with several novel and effective variants, including network depth and internal structure. In our experiments, we employ data filtering, large-scale back-translation, knowledge distillation, forward-translation, iterative in-domain knowledge finetune and model ensemble. The constrained systems achieve 48.8, 29.7, 39.3 and 22.0 case-sensitive BLEU scores on EN-ZH, ZH-EN, EN-JA and JA-EN, respectively.

[1]  Lingpeng Kong,et al.  Cascaded Head-colliding Attention , 2021, ACL.

[2]  Jingbo Zhu,et al.  ODE Transformer: An Ordinary Differential Equation-Inspired Model for Neural Machine Translation , 2021, ArXiv.

[3]  Kevin Duh,et al.  Very Deep Transformers for Neural Machine Translation , 2020, ArXiv.

[4]  Maksims Volkovs,et al.  Improving Transformer Optimization Through Better Initialization , 2020, ICML.

[5]  Shuming Shi,et al.  On the Inference Calibration of Neural Machine Translation , 2020, ACL.

[6]  Jiawei Han,et al.  Understanding the Difficulty of Training Transformers , 2020, EMNLP.

[7]  Ciprian Chelba,et al.  Tagged Back-Translation , 2019, WMT.

[8]  Jingbo Zhu,et al.  Learning Deep Transformer Models for Machine Translation , 2019, ACL.

[9]  Yejin Choi,et al.  The Curious Case of Neural Text Degeneration , 2019, ICLR.

[10]  Kai Zou,et al.  EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks , 2019, EMNLP.

[11]  Myle Ott,et al.  Understanding Back-Translation at Scale , 2018, EMNLP.

[12]  Ashish Vaswani,et al.  Self-Attention with Relative Position Representations , 2018, NAACL.

[13]  Lei Zheng,et al.  Texygen: A Benchmarking Platform for Text Generation Models , 2018, SIGIR.

[14]  Guillaume Lample,et al.  Unsupervised Machine Translation Using Monolingual Corpora Only , 2017, ICLR.

[15]  Tao Zhang,et al.  A Survey of Model Compression and Acceleration for Deep Neural Networks , 2017, ArXiv.

[16]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[17]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[19]  Alexandra Birch,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[20]  Noah A. Smith,et al.  A Simple, Fast, and Effective Reparameterization of IBM Model 2 , 2013, NAACL.

[21]  Jingbo Zhu,et al.  The NiuTrans Machine Translation Systems for WMT21 , 2021, WMT.

[22]  Yanjun Liu,et al.  WeChat Neural Machine Translation Systems for WMT21 , 2021, WMT.

[23]  Christopher D. Manning,et al.  Stanford Neural Machine Translation Systems for Spoken Language Domains , 2015, IWSLT.

[24]  Juan Enrique Ramos,et al.  Using TF-IDF to Determine Word Relevance in Document Queries , 2003 .