CASIA’s System for IWSLT 2020 Open Domain Translation

This paper describes the CASIA’s system for the IWSLT 2020 open domain translation task. This year we participate in both Chinese→Japanese and Japanese→Chinese translation tasks. Our system is neural machine translation system based on Transformer model. We augment the training data with knowledge distillation and back translation to improve the translation performance. Domain data classification and weighted domain model ensemble are introduced to generate the final translation result. We compare and analyze the performance on development data with different model settings and different data processing techniques.

[1]  Quoc V. Le,et al.  Effective Domain Mixing for Neural Machine Translation , 2017, WMT.

[2]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[3]  Yong Wang,et al.  Go From the General to the Particular: Multi-Domain Translation with Domain Transformation Networks , 2019, AAAI.

[4]  Rui Wang,et al.  A Survey of Domain Adaptation for Neural Machine Translation , 2018, COLING.

[5]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[6]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[7]  Noah A. Smith,et al.  A Simple, Fast, and Effective Reparameterization of IBM Model 2 , 2013, NAACL.

[8]  Jiajun Zhang,et al.  Exploiting Source-side Monolingual Data in Neural Machine Translation , 2016, EMNLP.

[9]  Graham Neubig,et al.  Unsupervised Domain Adaptation for Neural Machine Translation with Domain-Aware Feature Embeddings , 2019, EMNLP.

[10]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[11]  Rico Sennrich,et al.  Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures , 2018, EMNLP.

[12]  Jiajun Zhang,et al.  A Comparable Study on Model Averaging, Ensembling and Reranking in NMT , 2018, NLPCC.

[13]  Yann Dauphin,et al.  Pay Less Attention with Lightweight and Dynamic Convolutions , 2019, ICLR.

[14]  Ashish Vaswani,et al.  Self-Attention with Relative Position Representations , 2018, NAACL.

[15]  Joakim Nivre,et al.  An analysis of Attention Mechanism: The Case of Word Sense Disambiguation in Neural Machine Translation , 2018 .

[16]  Masao Utiyama,et al.  Sentence Embedding for Neural Machine Translation Domain Adaptation , 2017, ACL.

[17]  Philipp Koehn,et al.  Six Challenges for Neural Machine Translation , 2017, NMT@ACL.

[18]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[19]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[20]  Nadir Durrani,et al.  FINDINGS OF THE IWSLT 2020 EVALUATION CAMPAIGN , 2020, IWSLT.

[21]  Alan L. Yuille,et al.  Knowledge Distillation in Generations: More Tolerant Teachers Educate Better Students , 2018, ArXiv.

[22]  Deyi Xiong,et al.  Sentence Weighting for Neural Machine Translation Domain Adaptation , 2018, COLING.

[23]  Myle Ott,et al.  Understanding Back-Translation at Scale , 2018, EMNLP.

[24]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[25]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[26]  Myle Ott,et al.  fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[27]  Jiajun Zhang,et al.  Neural System Combination for Machine Translation , 2017, ACL.

[28]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[29]  Joakim Nivre,et al.  An Analysis of Attention Mechanisms: The Case of Word Sense Disambiguation in Neural Machine Translation , 2018, WMT.

[30]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.