Meta Ensemble for Japanese-Chinese Neural Machine Translation: Kyoto-U+ECNU Participation to WAT 2020

This paper describes the Japanese-Chinese Neural Machine Translation (NMT) system submitted by the joint team of Kyoto University and East China Normal University (KyotoU+ECNU) to WAT 2020 (Nakazawa et al., 2020). We participate in APSEC JapaneseChinese translation task. We revisit several techniques for NMT including various architectures, different data selection and augmentation methods, denoising pre-training, and also some specific tricks for Japanese-Chinese translation. We eventually perform a meta ensemble to combine all of the models into a single model. BLEU results of this meta ensembled model rank the first both on 2 directions of ASPEC Japanese-Chinese translation.

[1]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[2]  Chenhui Chu,et al.  Chinese Characters Mapping Table of Japanese, Traditional Chinese and Simplified Chinese , 2012, LREC.

[3]  Ulrich Germann,et al.  Character Mapping and Ad-hoc Adaptation: Edinburgh's IWSLT 2020 Open Domain Translation System , 2020, IWSLT.

[4]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[5]  Yann Dauphin,et al.  Pay Less Attention with Lightweight and Dynamic Convolutions , 2019, ICLR.

[6]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[7]  Daisuke Kawahara,et al.  Morphological Analysis for Unsegmented Languages using Recurrent Neural Network Language Model , 2015, EMNLP.

[8]  Marjan Ghazvininejad,et al.  Multilingual Denoising Pre-training for Neural Machine Translation , 2020, Transactions of the Association for Computational Linguistics.

[9]  Holger Schwenk,et al.  Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond , 2018, Transactions of the Association for Computational Linguistics.

[10]  Omer Levy,et al.  BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[11]  Myle Ott,et al.  fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[12]  Ming Zhou,et al.  Explicit Cross-lingual Pre-training for Unsupervised Machine Translation , 2019, EMNLP.

[13]  Chenhui Chu,et al.  An Empirical Comparison of Domain Adaptation Methods for Neural Machine Translation , 2017, ACL.

[14]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[15]  Isao Goto,et al.  Overview of the 7th Workshop on Asian Translation , 2020, WAT@AAC/IJCNLPL.

[16]  Sadao Kurohashi,et al.  MMCR4NLP: Multilingual Multiway Corpora Repository for Natural Language Processing , 2017, ArXiv.

[17]  Jungo Kasai,et al.  Deep Encoder, Shallow Decoder: Reevaluating the Speed-Quality Tradeoff in Machine Translation , 2020, ArXiv.

[18]  Xu Tan,et al.  MASS: Masked Sequence to Sequence Pre-training for Language Generation , 2019, ICML.

[19]  Toshiaki Nakazawa,et al.  ASPEC: Asian Scientific Paper Excerpt Corpus , 2016, LREC.

[20]  Nadir Durrani,et al.  FINDINGS OF THE IWSLT 2020 EVALUATION CAMPAIGN , 2020, IWSLT.

[21]  Jörg Tiedemann,et al.  Parallel Data, Tools and Interfaces in OPUS , 2012, LREC.

[22]  Myle Ott,et al.  Understanding Back-Translation at Scale , 2018, EMNLP.

[23]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[24]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[25]  Chenhui Chu,et al.  Constructing a Chinese—Japanese Parallel Corpus from Wikipedia , 2014, LREC.

[26]  Jörg Tiedemann,et al.  OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles , 2016, LREC.

[27]  Taku Kudo,et al.  SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.

[28]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[29]  Guillaume Lample,et al.  Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[30]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[31]  Holger Schwenk,et al.  WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia , 2019, EACL.

[32]  Chenhui Chu,et al.  Integrated Parallel Sentence and Fragment Extraction from Comparable Corpora: A Case Study on Chinese--Japanese Wikipedia , 2016, TALLIP.