论文信息 - The AISP-SJTU Translation System for WMT 2022 - 字舞流文

The AISP-SJTU Translation System for WMT 2022

This paper describes AISP-SJTU’s participation in WMT 2022 shared general MT task. In this shared task, we participated in four translation directions: English-Chinese, Chinese-English, English-Japanese and Japanese-English. Our systems are based on the Transformer architecture with several novel and effective variants, including network depth and internal structure. In our experiments, we employ data filtering, large-scale back-translation, knowledge distillation, forward-translation, iterative in-domain knowledge finetune and model ensemble. The constrained systems achieve 48.8, 29.7, 39.3 and 22.0 case-sensitive BLEU scores on EN-ZH, ZH-EN, EN-JA and JA-EN, respectively.

Kai Yu | Rui Wang | Qinpei Zhu | Renshou Wu | Guangfeng Liu | Xingyu Chen | Qingliang Miao | Renjie Feng | Jianxin Ren

[1] Lingpeng Kong,et al. Cascaded Head-colliding Attention , 2021, ACL.

[2] Jingbo Zhu,et al. ODE Transformer: An Ordinary Differential Equation-Inspired Model for Neural Machine Translation , 2021, ArXiv.

[3] Kevin Duh,et al. Very Deep Transformers for Neural Machine Translation , 2020, ArXiv.

[4] Maksims Volkovs,et al. Improving Transformer Optimization Through Better Initialization , 2020, ICML.

[5] Shuming Shi,et al. On the Inference Calibration of Neural Machine Translation , 2020, ACL.

[6] Jiawei Han,et al. Understanding the Difficulty of Training Transformers , 2020, EMNLP.

[7] Ciprian Chelba,et al. Tagged Back-Translation , 2019, WMT.

[8] Jingbo Zhu,et al. Learning Deep Transformer Models for Machine Translation , 2019, ACL.

[9] Yejin Choi,et al. The Curious Case of Neural Text Degeneration , 2019, ICLR.

[10] Kai Zou,et al. EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks , 2019, EMNLP.

[11] Myle Ott,et al. Understanding Back-Translation at Scale , 2018, EMNLP.

[12] Ashish Vaswani,et al. Self-Attention with Relative Position Representations , 2018, NAACL.

[13] Lei Zheng,et al. Texygen: A Benchmarking Platform for Text Generation Models , 2018, SIGIR.

[14] Guillaume Lample,et al. Unsupervised Machine Translation Using Monolingual Corpora Only , 2017, ICLR.

[15] Tao Zhang,et al. A Survey of Model Compression and Acceleration for Deep Neural Networks , 2017, ArXiv.

[16] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[17] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Rico Sennrich,et al. Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[19] Alexandra Birch,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[20] Noah A. Smith,et al. A Simple, Fast, and Effective Reparameterization of IBM Model 2 , 2013, NAACL.

[21] Jingbo Zhu,et al. The NiuTrans Machine Translation Systems for WMT21 , 2021, WMT.

[22] Yanjun Liu,et al. WeChat Neural Machine Translation Systems for WMT21 , 2021, WMT.

[23] Christopher D. Manning,et al. Stanford Neural Machine Translation Systems for Spoken Language Domains , 2015, IWSLT.

[24] Juan Enrique Ramos,et al. Using TF-IDF to Determine Word Relevance in Document Queries , 2003 .