论文信息 - Transfer learning and subword sampling for asymmetric-resource one-to-many neural translation - 字舞流文

Transfer learning and subword sampling for asymmetric-resource one-to-many neural translation

There are several approaches for improving neural machine translation for low-resource languages: Monolingual data can be exploited via pretraining or data augmentation; Parallel corpora on related language pairs can be used via parameter sharing or transfer learning in multilingual models; Subword segmentation and regularization techniques can be applied to ensure high coverage of the vocabulary. We review these approaches in the context of an asymmetric-resource one-to-many translation task, in which the pair of target languages are related, with one being a very low-resource and the other a higher-resource language. We test various methods on three artificially restricted translation tasks---English to Estonian (low-resource) and Finnish (high-resource), English to Slovak and Czech, English to Danish and Swedish---and one real-world task, Norwegian to North Sami and Finnish. The experiments show positive effects especially for scheduled multi-task learning, denoising autoencoder, and subword sampling.

Mikko Kurimo | Sami Virpioja | Stig-Arne Grönroos | M. Kurimo | Sami Virpioja | Stig-Arne Grönroos

[1] Felix Hieber,et al. Using Target-side Monolingual Data for Neural Machine Translation through Multi-task Learning , 2017, EMNLP.

[2] Elizabeth Salesky,et al. Optimizing segmentation granularity for neural machine translation , 2018, Machine Translation.

[3] Khalid Choukri,et al. The european language resources association , 1998, LREC.

[4] Yonatan Belinkov,et al. Synthetic and Natural Noise Both Break Neural Machine Translation , 2017, ICLR.

[5] Graham Neubig,et al. SwitchOut: an Efficient Data Augmentation Algorithm for Neural Machine Translation , 2018, EMNLP.

[6] Yoshua Bengio,et al. Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[7] Tom Kocmi,et al. Exploring Benefits of Transfer Learning in Neural Machine Translation , 2020, ArXiv.

[8] Tom M. Mitchell,et al. Contextual Parameter Generation for Universal Neural Machine Translation , 2018, EMNLP.

[9] Josef van Genabith,et al. Neural machine translation for low-resource languages without parallel corpora , 2017, Machine Translation.

[10] Maja Popovic,et al. chrF: character n-gram F-score for automatic MT evaluation , 2015, WMT@EMNLP.

[11] Marta R. Costa-jussà,et al. Byte-based Neural Machine Translation , 2017, SWCN@EMNLP.

[12] Adam Coates,et al. Cold Fusion: Training Seq2Seq Models Together with Language Models , 2017, INTERSPEECH.

[13] Philip Gage,et al. A new algorithm for data compression , 1994 .

[14] Philipp Koehn,et al. Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[15] Ankur Bapna,et al. Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges , 2019, ArXiv.

[16] Antonio Toral,et al. A Multifaceted Evaluation of Neural versus Phrase-Based Machine Translation for 9 Language Directions , 2017, EACL.

[17] Jiajun Zhang,et al. Exploiting Source-side Monolingual Data in Neural Machine Translation , 2016, EMNLP.

[18] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[19] Mathias Creutz,et al. Morphology-aware statistical machine translation based on morphs induced in an unsupervised manner , 2007, MTSUMMIT.

[20] Artem Sokolov,et al. Learning to Segment Inputs for NMT Favors Character-Level Processing , 2018, IWSLT.

[21] Tie-Yan Liu,et al. Dual Learning for Machine Translation , 2016, NIPS.

[22] Deniz Yuret,et al. Transfer Learning for Low-Resource Neural Machine Translation , 2016, EMNLP.

[23] Eliyahu Kiperwasser,et al. Scheduled Multi-Task Learning: From Syntax to Translation , 2018, TACL.

[24] Martin Wattenberg,et al. Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[25] H. Kalmus. Biological Cybernetics , 1972, Nature.

[26] Quoc V. Le,et al. Unsupervised Pretraining for Sequence to Sequence Learning , 2016, EMNLP.

[27] Mikko Kurimo,et al. Cognate-aware morphological segmentation for multilingual neural translation , 2018, WMT.

[28] Yoshua Bengio,et al. An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks , 2013, ICLR.

[29] Leena Salmi,et al. A product and process analysis of post-editor corrections on neural, statistical and rule-based machine translation output , 2019, Machine Translation.

[30] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[31] Taku Kudo,et al. Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates , 2018, ACL.

[32] Mikko Kurimo,et al. Morpho Challenge 2005-2010: Evaluations and Results , 2010, SIGMORPHON.

[33] Taku Kudo,et al. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.

[34] H. Bourlard,et al. Auto-association by multilayer perceptrons and singular value decomposition , 1988, Biological Cybernetics.

[35] Michael McCloskey,et al. Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[36] Quoc V. Le,et al. Multi-task Sequence to Sequence Learning , 2015, ICLR.

[37] Jörg Tiedemann,et al. Character-Based PSMT for Closely Related Languages , 2009, EAMT.

[38] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[39] Omer Levy,et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[40] Z. Harris. From Phoneme to Morpheme , 1955 .

[41] Alexander M. Rush,et al. OpenNMT: Open-Source Toolkit for Neural Machine Translation , 2017, ACL.

[42] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[43] Huda Khayrallah,et al. Freezing Subnetworks to Analyze Domain Adaptation in Neural Machine Translation , 2018, WMT.

[44] Rico Sennrich,et al. Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[45] Zhao Chen,et al. GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks , 2017, ICML.

[46] Yang Liu,et al. Neural Machine Translation with Reconstruction , 2016, AAAI.

[47] Miguel Ballesteros,et al. Multilingual Neural Machine Translation with Task-Specific Attention , 2018, COLING.

[48] Mathias Creutz,et al. Unsupervised Morpheme Segmentation and Morphology Induction from Text Corpora Using Morfessor 1.0 , 2005 .

[49] Mikko Kurimo,et al. Empirical Comparison of Evaluation Methods for Unsupervised Learning of Morphology , 2011, TAL.

[50] Young-Suk Lee,et al. Morphological Analysis for Statistical Machine Translation , 2004, NAACL.

[51] Jörg Tiedemann,et al. OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles , 2016, LREC.

[52] Phil Blunsom,et al. Recurrent Continuous Translation Models , 2013, EMNLP.

[53] Gavriel Salomon,et al. T RANSFER OF LEARNING , 1992 .

[54] Quoc V. Le,et al. Semi-supervised Sequence Learning , 2015, NIPS.

[55] Mikko Kurimo,et al. Learning a subword vocabulary based on unigram likelihood , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[56] Philipp Koehn,et al. Neural Machine Translation , 2017, ArXiv.

[57] Yoshua Bengio,et al. On Using Monolingual Corpora in Neural Machine Translation , 2015, ArXiv.

[58] Tomaz Erjavec,et al. The JRC-Acquis: A Multilingual Aligned Parallel Corpus with 20+ Languages , 2006, LREC.

[59] Yong Wang,et al. Meta-Learning for Low-Resource Neural Machine Translation , 2018, EMNLP.

[60] Graham Neubig,et al. Parameter Sharing Methods for Multilingual Self-Attentional Translation Models , 2018, WMT.

[61] Hal Daumé,et al. Deep Unordered Composition Rivals Syntactic Methods for Text Classification , 2015, ACL.

[62] Ankur Bapna,et al. Revisiting Character-Based Neural Machine Translation with Capacity and Compression , 2018, EMNLP.

[63] Massimiliano Pontil,et al. Multi-task Learning , 2020, Transfer Learning.

[64] Myle Ott,et al. Understanding Back-Translation at Scale , 2018, EMNLP.

[65] Francis M. Tyers,et al. Apertium: a free/open-source platform for rule-based machine translation , 2011, Machine Translation.

[66] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[67] Lars Borin,et al. Unsupervised Learning of Morphology , 2011, CL.

[68] Veselin Stoyanov,et al. Simple Fusion: Return of the Language Model , 2018, WMT.

[69] Eneko Agirre,et al. Unsupervised Neural Machine Translation , 2017, ICLR.

[70] Graham Neubig,et al. Improving Robustness of Machine Translation with Synthetic Noise , 2019, NAACL.

[71] Rico Sennrich,et al. Revisiting Low-Resource Neural Machine Translation: A Case Study , 2019, ACL.

[72] Mathias Creutz,et al. Unsupervised Discovery of Morphemes , 2002, SIGMORPHON.

[73] Ondrej Dusek,et al. CzEng 1.6: Enlarged Czech-English Parallel Corpus with Processing Tools Dockered , 2016, TSD.

[74] Mikko Kurimo,et al. Morfessor EM+Prune: Improved Subword Segmentation with Expectation Maximization and Pruning , 2020, LREC.

[75] Kemal Oflazer,et al. Exploring Different Representational Units in English-to-Turkish Statistical Machine Translation , 2007, WMT@ACL.

[76] Philipp Koehn,et al. Findings of the 2018 Conference on Machine Translation (WMT18) , 2018, WMT.

[77] Ondrej Bojar,et al. Trivial Transfer Learning for Low-Resource Neural Machine Translation , 2018, WMT.

[78] Maosong Sun,et al. Semi-Supervised Learning for Neural Machine Translation , 2016, ACL.

[79] Kenneth Heafield,et al. Copied Monolingual Data Improves Low-Resource Neural Machine Translation , 2017, WMT.

[80] Guillaume Lample,et al. Phrase-Based & Neural Unsupervised Machine Translation , 2018, EMNLP.

[81] Yoshua Bengio,et al. Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism , 2016, NAACL.

[82] Xu Tan,et al. MASS: Masked Sequence to Sequence Pre-training for Language Generation , 2019, ICML.

[83] Ivan Skorokhodov,et al. Semi-Supervised Neural Machine Translation with Language Models , 2018, LoResMT@AMTA.

[84] Wei Chen,et al. Unsupervised Neural Machine Translation with Weight Sharing , 2018 .

[85] John A. Goldsmith,et al. Unsupervised Learning of the Morphology of a Natural Language , 2001, CL.

[86] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[87] Oskar Kohonen,et al. Semi-Supervised Learning of Concatenative Morphology , 2010, SIGMORPHON.

[88] Mikko Kurimo,et al. Morfessor 2.0: Python Implementation and Extensions for Morfessor Baseline , 2013 .

[89] Ciprian Chelba,et al. Tagged Back-Translation , 2019, WMT.

[90] David Yarowsky,et al. An Analysis of Massively Multilingual Neural Machine Translation for Low-Resource Languages , 2020, LREC.

[91] Chenhui Chu,et al. An Empirical Comparison of Domain Adaptation Methods for Neural Machine Translation , 2017, ACL.

[92] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[93] S. L. Scott. Bayesian Methods for Hidden Markov Models , 2002 .

[94] Chenhui Chu,et al. A Comprehensive Survey of Multilingual Neural Machine Translation , 2020, ArXiv.

[95] Tetsuji Nakagawa,et al. An Empirical Study of Language Relatedness for Transfer Learning in Neural Machine Translation , 2017, PACLIC.

[96] Yoshua Bengio,et al. A Character-level Decoder without Explicit Segmentation for Neural Machine Translation , 2016, ACL.

[97] Hermann Ney,et al. Generalizing Back-Translation in Neural Machine Translation , 2019, WMT.

[98] Mathias Creutz,et al. Unsupervised models for morpheme segmentation and morphology learning , 2007, TSLP.

[99] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[100] Guillaume Lample,et al. Unsupervised Machine Translation Using Monolingual Corpora Only , 2017, ICLR.

[101] José A. R. Fonollosa,et al. Character-based Neural Machine Translation , 2016, ACL.