Reusing Monolingual Pre-Trained Models by Cross-Connecting Seq2seq Models for Machine Translation

This work uses sequence-to-sequence (seq2seq) models pre-trained on monolingual corpora for machine translation. We pre-train two seq2seq models with monolingual corpora for the source and target languages, then combine the encoder of the source language model and the decoder of the target language model, i.e., the cross-connection. We add an intermediate layer between the pre-trained encoder and the decoder to help the mapping of each other since the modules are pre-trained completely independently. These monolingual pre-trained models can work as a multilingual pre-trained model because one model can be cross-connected with another model pre-trained on any other language, while their capacity is not affected by the number of languages. We will demonstrate that our method improves the translation performance significantly over the random baseline. Moreover, we will analyze the appropriate choice of the intermediate layer, the importance of each part of a pre-trained model, and the performance change along with the size of the bitext.

[1]  Armen Aghajanyan,et al.  Pre-training via Paraphrasing , 2020, NeurIPS.

[2]  Diedre Carmo,et al.  PTT5: Pretraining and validating the T5 model on Brazilian Portuguese data , 2020, ArXiv.

[3]  Jörg Tiedemann,et al.  On the differences between BERT and MT encoder spaces and how to address them in translation tasks , 2021, ACL.

[4]  Guillaume Lample,et al.  Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[5]  Orhan Firat,et al.  Massively Multilingual Neural Machine Translation , 2019, NAACL.

[6]  Alexandra Birch,et al.  Exploring Unsupervised Pretraining Objectives for Machine Translation , 2021, FINDINGS.

[7]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[8]  Xiaodong Liu,et al.  Unified Language Model Pre-training for Natural Language Understanding and Generation , 2019, NeurIPS.

[9]  Sampo Pyysalo,et al.  The birth of Romanian BERT , 2020, FINDINGS.

[10]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[11]  Mikhail Arkhipov,et al.  Adaptation of Deep Bidirectional Multilingual Transformers for Russian Language , 2019, ArXiv.

[12]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[13]  Omer Levy,et al.  SpanBERT: Improving Pre-training by Representing and Predicting Spans , 2019, TACL.

[14]  Vassilina Nikoulina,et al.  On the use of BERT for Neural Machine Translation , 2019, EMNLP.

[15]  Bettina Berendt,et al.  RobBERT: a Dutch RoBERTa-based Language Model , 2020, FINDINGS.

[16]  Laurent Romary,et al.  CamemBERT: a Tasty French Language Model , 2019, ACL.

[17]  Xu Tan,et al.  MASS: Masked Sequence to Sequence Pre-training for Language Generation , 2019, ICML.

[18]  Yichao Lu,et al.  A neural interlingua for multilingual machine translation , 2018, WMT.

[19]  Yoshua Bengio,et al.  Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism , 2016, NAACL.

[20]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[21]  Marjan Ghazvininejad,et al.  Multilingual Denoising Pre-training for Neural Machine Translation , 2020, Transactions of the Association for Computational Linguistics.

[22]  Yong Suk Choi,et al.  Image-To-Image Translation Using a Cross-Domain Auto-Encoder and Decoder , 2019, Applied Sciences.

[23]  Furu Wei,et al.  mT6: Multilingual Pretrained Text-to-Text Transformer with Translation Pairs , 2021, EMNLP.

[24]  Rodrigo Nogueira,et al.  Portuguese Named Entity Recognition using BERT-CRF , 2019, ArXiv.

[25]  Omer Levy,et al.  BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[26]  Tommaso Caselli,et al.  BERTje: A Dutch BERT Model , 2019, ArXiv.

[27]  Benjamin Lecouteux,et al.  FlauBERT: Unsupervised Language Model Pre-training for French , 2020, LREC.

[28]  Mikel Artetxe,et al.  On the Cross-lingual Transferability of Monolingual Representations , 2019, ACL.

[29]  Veselin Stoyanov,et al.  Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[30]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[31]  Colin Raffel,et al.  mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer , 2021, NAACL.

[32]  Tie-Yan Liu,et al.  Incorporating BERT into Neural Machine Translation , 2020, ICLR.

[33]  Shashi Narayan,et al.  Leveraging Pre-trained Checkpoints for Sequence Generation Tasks , 2019, Transactions of the Association for Computational Linguistics.

[34]  Yuqing Tang,et al.  Multilingual Translation with Extensible Multilingual Pretraining and Finetuning , 2020, ArXiv.

[35]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[36]  Malvina Nissim,et al.  GePpeTto Carves Italian into a Language Model , 2020, CLiC-it.

[37]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.