论文信息 - Explicit Cross-lingual Pre-training for Unsupervised Machine Translation - 字舞流文

Explicit Cross-lingual Pre-training for Unsupervised Machine Translation

Pre-training has proven to be effective in unsupervised machine translation due to its ability to model deep context information in cross-lingual scenarios. However, the cross-lingual information obtained from shared BPE spaces is inexplicit and limited. In this paper, we propose a novel cross-lingual pre-training method for unsupervised machine translation by incorporating explicit cross-lingual training signals. Specifically, we first calculate cross-lingual n-gram embeddings and infer an n-gram translation table from them. With those n-gram translation pairs, we propose a new pre-training model called Cross-lingual Masked Language Model (CMLM), which randomly chooses source n-grams in the input text stream and predicts their translation candidates at each time step. Experiments show that our method can incorporate beneficial cross-lingual information into pre-trained models. Taking pre-trained CMLM models as the encoder and decoder, we significantly improve the performance of unsupervised machine translation.

Ming Zhou | Shujie Liu | Shuai Ma | Yu Wu | Shuo Ren | M. Zhou | Shujie Liu | Shuai Ma | Shuo Ren | Yu Wu

[1] Holger Schwenk,et al. Margin-based Parallel Corpus Mining with Multilingual Sentence Embeddings , 2018, ACL.

[2] Hermann Ney,et al. Deciphering Foreign Language by Combining Language Models and Context Vectors , 2012, ACL.

[3] Ming Zhou,et al. Unsupervised Neural Machine Translation with SMT as Posterior Regularization , 2019, AAAI.

[4] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .

[5] Eneko Agirre,et al. A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings , 2018, ACL.

[6] Eneko Agirre,et al. Unsupervised Neural Machine Translation , 2017, ICLR.

[7] Rico Sennrich,et al. Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[8] Robert L. Mercer,et al. The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[9] Atsushi Fujita,et al. Unsupervised Neural Machine Translation Initialized by Unsupervised Statistical Machine Translation , 2018, ArXiv.

[10] Tomas Mikolov,et al. Enriching Word Vectors with Subword Information , 2016, TACL.

[11] Hermann Ney,et al. A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[12] Guillaume Lample,et al. Word Translation Without Parallel Data , 2017, ICLR.

[13] Pascal Vincent,et al. Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[14] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[15] Enhong Chen,et al. Joint Training for Neural Machine Translation Models with Monolingual Data , 2018, AAAI.

[16] Xu Tan,et al. MASS: Masked Sequence to Sequence Pre-training for Language Generation , 2019, ICML.

[17] Wei Chen,et al. Unsupervised Neural Machine Translation with Weight Sharing , 2018 .

[18] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[19] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[20] Guillaume Lample,et al. Phrase-Based & Neural Unsupervised Machine Translation , 2018, EMNLP.

[21] Guillaume Lample,et al. Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[22] Kevin Gimpel,et al. Bridging Nonlinearities and Stochastic Regularizers with Gaussian Error Linear Units , 2016, ArXiv.

[23] Guillaume Lample,et al. XNLI: Evaluating Cross-lingual Sentence Representations , 2018, EMNLP.

[24] Ted Pedersen,et al. An Evaluation Exercise for Word Alignment , 2003, ParallelTexts@NAACL-HLT.

[25] Eneko Agirre,et al. Unsupervised Statistical Machine Translation , 2018, EMNLP.

[26] Guillaume Lample,et al. Unsupervised Machine Translation Using Monolingual Corpora Only , 2017, ICLR.

[27] David Yarowsky,et al. Toward Statistical Machine Translation without Parallel Corpora , 2012, EACL 2012.

[28] Wilson L. Taylor,et al. “Cloze Procedure”: A New Tool for Measuring Readability , 1953 .

[29] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.