Transfer learning and subword sampling for asymmetric-resource one-to-many neural translation

There are several approaches for improving neural machine translation for low-resource languages: Monolingual data can be exploited via pretraining or data augmentation; Parallel corpora on related language pairs can be used via parameter sharing or transfer learning in multilingual models; Subword segmentation and regularization techniques can be applied to ensure high coverage of the vocabulary. We review these approaches in the context of an asymmetric-resource one-to-many translation task, in which the pair of target languages are related, with one being a very low-resource and the other a higher-resource language. We test various methods on three artificially restricted translation tasks---English to Estonian (low-resource) and Finnish (high-resource), English to Slovak and Czech, English to Danish and Swedish---and one real-world task, Norwegian to North Sami and Finnish. The experiments show positive effects especially for scheduled multi-task learning, denoising autoencoder, and subword sampling.

[1]  Felix Hieber,et al.  Using Target-side Monolingual Data for Neural Machine Translation through Multi-task Learning , 2017, EMNLP.

[2]  Elizabeth Salesky,et al.  Optimizing segmentation granularity for neural machine translation , 2018, Machine Translation.

[3]  Khalid Choukri,et al.  The european language resources association , 1998, LREC.

[4]  Yonatan Belinkov,et al.  Synthetic and Natural Noise Both Break Neural Machine Translation , 2017, ICLR.

[5]  Graham Neubig,et al.  SwitchOut: an Efficient Data Augmentation Algorithm for Neural Machine Translation , 2018, EMNLP.

[6]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[7]  Tom Kocmi,et al.  Exploring Benefits of Transfer Learning in Neural Machine Translation , 2020, ArXiv.

[8]  Tom M. Mitchell,et al.  Contextual Parameter Generation for Universal Neural Machine Translation , 2018, EMNLP.

[9]  Josef van Genabith,et al.  Neural machine translation for low-resource languages without parallel corpora , 2017, Machine Translation.

[10]  Maja Popovic,et al.  chrF: character n-gram F-score for automatic MT evaluation , 2015, WMT@EMNLP.

[11]  Marta R. Costa-jussà,et al.  Byte-based Neural Machine Translation , 2017, SWCN@EMNLP.

[12]  Adam Coates,et al.  Cold Fusion: Training Seq2Seq Models Together with Language Models , 2017, INTERSPEECH.

[13]  Philip Gage,et al.  A new algorithm for data compression , 1994 .

[14]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[15]  Ankur Bapna,et al.  Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges , 2019, ArXiv.

[16]  Antonio Toral,et al.  A Multifaceted Evaluation of Neural versus Phrase-Based Machine Translation for 9 Language Directions , 2017, EACL.

[17]  Jiajun Zhang,et al.  Exploiting Source-side Monolingual Data in Neural Machine Translation , 2016, EMNLP.

[18]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[19]  Mathias Creutz,et al.  Morphology-aware statistical machine translation based on morphs induced in an unsupervised manner , 2007, MTSUMMIT.

[20]  Artem Sokolov,et al.  Learning to Segment Inputs for NMT Favors Character-Level Processing , 2018, IWSLT.

[21]  Tie-Yan Liu,et al.  Dual Learning for Machine Translation , 2016, NIPS.

[22]  Deniz Yuret,et al.  Transfer Learning for Low-Resource Neural Machine Translation , 2016, EMNLP.

[23]  Eliyahu Kiperwasser,et al.  Scheduled Multi-Task Learning: From Syntax to Translation , 2018, TACL.

[24]  Martin Wattenberg,et al.  Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[25]  H. Kalmus Biological Cybernetics , 1972, Nature.

[26]  Quoc V. Le,et al.  Unsupervised Pretraining for Sequence to Sequence Learning , 2016, EMNLP.

[27]  Mikko Kurimo,et al.  Cognate-aware morphological segmentation for multilingual neural translation , 2018, WMT.

[28]  Yoshua Bengio,et al.  An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks , 2013, ICLR.

[29]  Leena Salmi,et al.  A product and process analysis of post-editor corrections on neural, statistical and rule-based machine translation output , 2019, Machine Translation.

[30]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[31]  Taku Kudo,et al.  Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates , 2018, ACL.

[32]  Mikko Kurimo,et al.  Morpho Challenge 2005-2010: Evaluations and Results , 2010, SIGMORPHON.

[33]  Taku Kudo,et al.  SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.

[34]  H. Bourlard,et al.  Auto-association by multilayer perceptrons and singular value decomposition , 1988, Biological Cybernetics.

[35]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[36]  Quoc V. Le,et al.  Multi-task Sequence to Sequence Learning , 2015, ICLR.

[37]  Jörg Tiedemann,et al.  Character-Based PSMT for Closely Related Languages , 2009, EAMT.

[38]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[39]  Omer Levy,et al.  BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[40]  Z. Harris From Phoneme to Morpheme , 1955 .

[41]  Alexander M. Rush,et al.  OpenNMT: Open-Source Toolkit for Neural Machine Translation , 2017, ACL.

[42]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[43]  Huda Khayrallah,et al.  Freezing Subnetworks to Analyze Domain Adaptation in Neural Machine Translation , 2018, WMT.

[44]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[45]  Zhao Chen,et al.  GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks , 2017, ICML.

[46]  Yang Liu,et al.  Neural Machine Translation with Reconstruction , 2016, AAAI.

[47]  Miguel Ballesteros,et al.  Multilingual Neural Machine Translation with Task-Specific Attention , 2018, COLING.

[48]  Mathias Creutz,et al.  Unsupervised Morpheme Segmentation and Morphology Induction from Text Corpora Using Morfessor 1.0 , 2005 .

[49]  Mikko Kurimo,et al.  Empirical Comparison of Evaluation Methods for Unsupervised Learning of Morphology , 2011, TAL.

[50]  Young-Suk Lee,et al.  Morphological Analysis for Statistical Machine Translation , 2004, NAACL.

[51]  Jörg Tiedemann,et al.  OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles , 2016, LREC.

[52]  Phil Blunsom,et al.  Recurrent Continuous Translation Models , 2013, EMNLP.

[53]  Gavriel Salomon,et al.  T RANSFER OF LEARNING , 1992 .

[54]  Quoc V. Le,et al.  Semi-supervised Sequence Learning , 2015, NIPS.

[55]  Mikko Kurimo,et al.  Learning a subword vocabulary based on unigram likelihood , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[56]  Philipp Koehn,et al.  Neural Machine Translation , 2017, ArXiv.

[57]  Yoshua Bengio,et al.  On Using Monolingual Corpora in Neural Machine Translation , 2015, ArXiv.

[58]  Tomaz Erjavec,et al.  The JRC-Acquis: A Multilingual Aligned Parallel Corpus with 20+ Languages , 2006, LREC.

[59]  Yong Wang,et al.  Meta-Learning for Low-Resource Neural Machine Translation , 2018, EMNLP.

[60]  Graham Neubig,et al.  Parameter Sharing Methods for Multilingual Self-Attentional Translation Models , 2018, WMT.

[61]  Hal Daumé,et al.  Deep Unordered Composition Rivals Syntactic Methods for Text Classification , 2015, ACL.

[62]  Ankur Bapna,et al.  Revisiting Character-Based Neural Machine Translation with Capacity and Compression , 2018, EMNLP.

[63]  Massimiliano Pontil,et al.  Multi-task Learning , 2020, Transfer Learning.

[64]  Myle Ott,et al.  Understanding Back-Translation at Scale , 2018, EMNLP.

[65]  Francis M. Tyers,et al.  Apertium: a free/open-source platform for rule-based machine translation , 2011, Machine Translation.

[66]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[67]  Lars Borin,et al.  Unsupervised Learning of Morphology , 2011, CL.

[68]  Veselin Stoyanov,et al.  Simple Fusion: Return of the Language Model , 2018, WMT.

[69]  Eneko Agirre,et al.  Unsupervised Neural Machine Translation , 2017, ICLR.

[70]  Graham Neubig,et al.  Improving Robustness of Machine Translation with Synthetic Noise , 2019, NAACL.

[71]  Rico Sennrich,et al.  Revisiting Low-Resource Neural Machine Translation: A Case Study , 2019, ACL.

[72]  Mathias Creutz,et al.  Unsupervised Discovery of Morphemes , 2002, SIGMORPHON.

[73]  Ondrej Dusek,et al.  CzEng 1.6: Enlarged Czech-English Parallel Corpus with Processing Tools Dockered , 2016, TSD.

[74]  Mikko Kurimo,et al.  Morfessor EM+Prune: Improved Subword Segmentation with Expectation Maximization and Pruning , 2020, LREC.

[75]  Kemal Oflazer,et al.  Exploring Different Representational Units in English-to-Turkish Statistical Machine Translation , 2007, WMT@ACL.

[76]  Philipp Koehn,et al.  Findings of the 2018 Conference on Machine Translation (WMT18) , 2018, WMT.

[77]  Ondrej Bojar,et al.  Trivial Transfer Learning for Low-Resource Neural Machine Translation , 2018, WMT.

[78]  Maosong Sun,et al.  Semi-Supervised Learning for Neural Machine Translation , 2016, ACL.

[79]  Kenneth Heafield,et al.  Copied Monolingual Data Improves Low-Resource Neural Machine Translation , 2017, WMT.

[80]  Guillaume Lample,et al.  Phrase-Based & Neural Unsupervised Machine Translation , 2018, EMNLP.

[81]  Yoshua Bengio,et al.  Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism , 2016, NAACL.

[82]  Xu Tan,et al.  MASS: Masked Sequence to Sequence Pre-training for Language Generation , 2019, ICML.

[83]  Ivan Skorokhodov,et al.  Semi-Supervised Neural Machine Translation with Language Models , 2018, LoResMT@AMTA.

[84]  Wei Chen,et al.  Unsupervised Neural Machine Translation with Weight Sharing , 2018 .

[85]  John A. Goldsmith,et al.  Unsupervised Learning of the Morphology of a Natural Language , 2001, CL.

[86]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[87]  Oskar Kohonen,et al.  Semi-Supervised Learning of Concatenative Morphology , 2010, SIGMORPHON.

[88]  Mikko Kurimo,et al.  Morfessor 2.0: Python Implementation and Extensions for Morfessor Baseline , 2013 .

[89]  Ciprian Chelba,et al.  Tagged Back-Translation , 2019, WMT.

[90]  David Yarowsky,et al.  An Analysis of Massively Multilingual Neural Machine Translation for Low-Resource Languages , 2020, LREC.

[91]  Chenhui Chu,et al.  An Empirical Comparison of Domain Adaptation Methods for Neural Machine Translation , 2017, ACL.

[92]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[93]  S. L. Scott Bayesian Methods for Hidden Markov Models , 2002 .

[94]  Chenhui Chu,et al.  A Comprehensive Survey of Multilingual Neural Machine Translation , 2020, ArXiv.

[95]  Tetsuji Nakagawa,et al.  An Empirical Study of Language Relatedness for Transfer Learning in Neural Machine Translation , 2017, PACLIC.

[96]  Yoshua Bengio,et al.  A Character-level Decoder without Explicit Segmentation for Neural Machine Translation , 2016, ACL.

[97]  Hermann Ney,et al.  Generalizing Back-Translation in Neural Machine Translation , 2019, WMT.

[98]  Mathias Creutz,et al.  Unsupervised models for morpheme segmentation and morphology learning , 2007, TSLP.

[99]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[100]  Guillaume Lample,et al.  Unsupervised Machine Translation Using Monolingual Corpora Only , 2017, ICLR.

[101]  José A. R. Fonollosa,et al.  Character-based Neural Machine Translation , 2016, ACL.