On the Difficulty of Translating Free-Order Case-Marking Languages

Abstract Identifying factors that make certain languages harder to model than others is essential to reach language equality in future Natural Language Processing technologies. Free-order case-marking languages, such as Russian, Latin, or Tamil, have proved more challenging than fixed-order languages for the tasks of syntactic parsing and subject-verb agreement prediction. In this work, we investigate whether this class of languages is also more difficult to translate by state-of-the-art Neural Machine Translation (NMT) models. Using a variety of synthetic languages and a newly introduced translation challenge set, we find that word order flexibility in the source language only leads to a very small loss of NMT quality, even though the core verb arguments become impossible to disambiguate in sentences without semantic cues. The latter issue is indeed solved by the addition of case marking. However, in medium- and low-resource settings, the overall NMT quality of fixed-order languages remains unmatched.

[1]  Ryan Cotterell,et al.  It’s Easier to Translate out of English than into it: Measuring Neural Translation Difficulty by Cross-Mutual Information , 2020, ACL.

[2]  Lijun Wu,et al.  Achieving Human Parity on Automatic Chinese to English News Translation , 2018, ArXiv.

[3]  Marjan Ghazvininejad,et al.  Multilingual Denoising Pre-training for Neural Machine Translation , 2020, Transactions of the Association for Computational Linguistics.

[4]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[5]  Paola Merlo,et al.  Multi-lingual Dependency Parsing Evaluation: a Large-scale Analysis of Word Order Properties using Artificial Data , 2016, TACL.

[6]  Tom M. Mitchell,et al.  Contextual Parameter Generation for Universal Neural Machine Translation , 2018, EMNLP.

[7]  Eugene Kharitonov,et al.  Word-order Biases in Deep-agent Emergent Communication , 2019, ACL.

[8]  Benoît Sagot,et al.  Comparing Complexity Measures , 2013 .

[9]  Paola Merlo,et al.  Diachronic Trends in Word Order Freedom and Dependency Length in Dependency-Annotated Corpora of Latin and Ancient Greek , 2015, DepLing.

[10]  Yonatan Bisk,et al.  Inducing Grammars with and for Neural Machine Translation , 2018, NMT@ACL.

[11]  Ryan Cotterell,et al.  Predicting Declension Class from Form and Meaning , 2020, ACL.

[12]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[13]  Monojit Choudhury,et al.  The State and Fate of Linguistic Diversity and Inclusion in the NLP World , 2020, ACL.

[14]  Philipp Koehn,et al.  Predicting Success in Machine Translation , 2008, EMNLP.

[15]  Martin Popel,et al.  Transforming machine translation: a deep learning system reaches news translation quality comparable to human professionals , 2020, Nature Communications.

[16]  Michael Meeuwis,et al.  Order of subject, object, and verb , 2013 .

[17]  Kaius Sinnemäki,et al.  Complexity trade-offs in core argument marking , 2008 .

[18]  Marcello Federico,et al.  An Evaluation of Two Vocabulary Reduction Methods for Neural Machine Translation , 2018, AMTA.

[19]  Andy Way,et al.  Pre-Reordering for Neural Machine Translation: Helpful or Harmful? , 2017, Prague Bull. Math. Linguistics.

[20]  Kevin Duh,et al.  Automatic Evaluation of Translation Quality for Distant Language Pairs , 2010, EMNLP.

[21]  Gary Lupyan,et al.  Case, Word Order, and Language Learnability: Insights from Connectionist Modeling , 2019, Proceedings of the Twenty-Fourth Annual Conference of the Cognitive Science Society.

[22]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[23]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[24]  Ryan Cotterell,et al.  What Kind of Language Is Hard to Language-Model? , 2019, ACL.

[25]  Marta R. Costa-jussà,et al.  Findings of the 2019 Conference on Machine Translation (WMT19) , 2019, WMT.

[26]  Christof Monz,et al.  The Importance of Being Recurrent for Modeling Hierarchical Structure , 2018, EMNLP.

[27]  Philipp Koehn,et al.  Findings of the 2018 Conference on Machine Translation (WMT18) , 2018, WMT.

[28]  Joseph H. Greenberg,et al.  Some Universals of Grammar with Particular Reference to the Order of Meaningful Elements , 1990, On Language.

[29]  Jason Eisner,et al.  The Galactic Dependencies Treebanks: Getting More Data by Synthesizing New Languages , 2016, TACL.

[30]  Ryan Cotterell,et al.  Are All Languages Equally Hard to Language-Model? , 2018, NAACL.

[31]  Christopher D. Manning,et al.  Stanza: A Python Natural Language Processing Toolkit for Many Human Languages , 2020, ACL.

[32]  Tom Goldstein,et al.  Analyzing the effect of neural network architecture on training performance , 2020, ICML 2020.

[33]  Michael Hahn,et al.  Theoretical Limitations of Self-Attention in Neural Sequence Models , 2019, TACL.

[34]  Edouard Grave,et al.  Colorless Green Recurrent Networks Dream Hierarchically , 2018, NAACL.

[35]  Daniel Gildea,et al.  Do Grammars Minimize Dependency Length? , 2010, Cogn. Sci..

[36]  Yoshimasa Tsuruoka,et al.  Tree-to-Sequence Attentional Neural Machine Translation , 2016, ACL.

[37]  David Chiang,et al.  An Introduction to Synchronous Grammars , 2006 .

[38]  Richard Futrell,et al.  Large-scale evidence of dependency length minimization in 37 languages , 2015, Proceedings of the National Academy of Sciences.

[39]  Yoav Goldberg,et al.  Studying the Inductive Biases of RNNs with Synthetic Variations of Natural Languages , 2019, NAACL.

[40]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[41]  Philipp Koehn,et al.  Two New Evaluation Datasets for Low-Resource Machine Translation: Nepali-English and Sinhala-English , 2019, ArXiv.

[42]  Kenneth Heafield,et al.  Incorporating Source Syntax into Transformer-Based Neural Machine Translation , 2019, WMT.

[43]  Anna Korhonen,et al.  On the Relation between Linguistic Typology and (Limitations of) Multilingual Language Modeling , 2018, EMNLP.

[44]  Danqi Chen,et al.  of the Association for Computational Linguistics: , 2001 .

[45]  G. Miller,et al.  The Genesis of Language: A Psycholinguistic Approach , 1966 .

[46]  Yonatan Belinkov,et al.  What do Neural Machine Translation Models Learn about Morphology? , 2017, ACL.

[47]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[48]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[49]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[50]  Richard Futrell,et al.  Quantifying Word Order Freedom in Dependency Corpora , 2015, DepLing.

[51]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[52]  Comrie Bernard Language Universals and Linguistic Typology , 1982 .