Incorporating Word and Subword Units in Unsupervised Machine Translation Using Language Model Rescoring

This paper describes CAiRE's submission to the unsupervised machine translation track of the WMT'19 news shared task from German to Czech. We leverage a phrase-based statistical machine translation (PBSMT) model and a pre-trained language model to combine word-level neural machine translation (NMT) and subword-level NMT models without using any parallel data. We propose to solve the morphological richness problem of languages by training byte-pair encoding (BPE) embeddings for German and Czech separately, and they are aligned using MUSE (Conneau et al., 2018). To ensure the fluency and consistency of translations, a rescoring mechanism is proposed that reuses the pre-trained language model to select the translation candidates generated through beam search. Moreover, a series of pre-processing and post-processing approaches are applied to improve the quality of final translations.

[1]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[2]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[3]  Guillaume Lample,et al.  Unsupervised Machine Translation Using Monolingual Corpora Only , 2017, ICLR.

[4]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[5]  Pascale Fung,et al.  CAiRE_HKUST at SemEval-2019 Task 3: Hierarchical Attention for Dialogue Emotion Classification , 2019, SemEval@NAACL-HLT.

[6]  Richard Socher,et al.  Regularizing and Optimizing LSTM Language Models , 2017, ICLR.

[7]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[8]  Guillaume Lample,et al.  Phrase-Based & Neural Unsupervised Machine Translation , 2018, EMNLP.

[9]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[10]  Guillaume Lample,et al.  Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[11]  Xin Wang,et al.  Extract and Edit: An Alternative to Back-Translation for Unsupervised Neural Machine Translation , 2019, NAACL.

[12]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[13]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[14]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[15]  Pascale Fung,et al.  Team yeon-zi at SemEval-2019 Task 4: Hyperpartisan News Detection by De-noising Weakly-labeled Data , 2019, SemEval@NAACL-HLT.

[16]  Peng Xu,et al.  PlusEmo2Vec at SemEval-2018 Task 1: Exploiting emotion knowledge from emoji and #hashtags , 2018, *SEMEVAL.

[17]  Yannick Versley,et al.  Statistical Parsing of Morphologically Rich Languages (SPMRL) What, How and Whither , 2010, SPMRL@NAACL-HLT.

[18]  Richard Socher,et al.  An Analysis of Neural Language Modeling at Multiple Scales , 2018, ArXiv.

[19]  Ryan Cotterell,et al.  Are All Languages Equally Hard to Language-Model? , 2018, NAACL.

[20]  Guillaume Lample,et al.  Word Translation Without Parallel Data , 2017, ICLR.

[21]  Kenneth Heafield,et al.  KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.

[22]  Samy Bengio,et al.  Tensor2Tensor for Neural Machine Translation , 2018, AMTA.

[23]  Eneko Agirre,et al.  Learning bilingual word embeddings with (almost) no bilingual data , 2017, ACL.

[24]  Eneko Agirre,et al.  Unsupervised Statistical Machine Translation , 2018, EMNLP.

[25]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[26]  Eneko Agirre,et al.  Unsupervised Neural Machine Translation , 2017, ICLR.

[27]  Victor O. K. Li,et al.  Universal Neural Machine Translation for Extremely Low Resource Languages , 2018, NAACL.