Translation Artifacts in Cross-lingual Transfer Learning

Both human and machine translation play a central role in cross-lingual transfer learning: many multilingual datasets have been created through professional translation services, and using machine translation to translate either the test set or the training set is a widely used transfer technique. In this paper, we show that such translation process can introduce subtle artifacts that have a notable impact in existing cross-lingual models. For instance, in natural language inference, translating the premise and the hypothesis independently can reduce the lexical overlap between them, which current models are highly sensitive to. We show that some previous findings in cross-lingual transfer learning need to be reconsidered in the light of this phenomenon. Based on the gained insights, we also improve the state-of-the-art in XNLI for the translate-test and zero-shot approaches by 4.3 and 2.8 points, respectively.

[1]  Omer Levy,et al.  Annotation Artifacts in Natural Language Inference Data , 2018, NAACL.

[2]  Mohit Bansal,et al.  Adversarial NLI: A New Benchmark for Natural Language Understanding , 2020, ACL.

[3]  Graham Neubig,et al.  XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization , 2020, ICML.

[4]  Sebastian Riedel,et al.  MLQA: Evaluating Cross-lingual Extractive Question Answering , 2019, ACL.

[5]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[6]  Jörg Tiedemann,et al.  Parallel Data, Tools and Interfaces in OPUS , 2012, LREC.

[7]  Guillaume Lample,et al.  Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[8]  Yoav Goldberg,et al.  Breaking NLI Systems with Sentences that Require Simple Lexical Inferences , 2018, ACL.

[9]  Eunsol Choi,et al.  TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages , 2020, Transactions of the Association for Computational Linguistics.

[10]  Holger Schwenk,et al.  A Corpus for Multilingual Document Classification in Eight Languages , 2018, LREC.

[11]  Noah A. Smith,et al.  A Simple, Fast, and Effective Reparameterization of IBM Model 2 , 2013, NAACL.

[12]  Mikel L. Forcada,et al.  ParaCrawl: Web-scale parallel corpora for the languages of the EU , 2019, MTSummit.

[13]  Philipp Koehn,et al.  Translationese in Machine Translation Evaluation , 2019, EMNLP.

[14]  Taku Kudo,et al.  SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.

[15]  Carolyn Penstein Rosé,et al.  Stress Test Evaluation for Natural Language Inference , 2018, COLING.

[16]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[17]  Mikel Artetxe,et al.  On the Cross-lingual Transferability of Monolingual Representations , 2019, ACL.

[18]  Rachel Rudinger,et al.  Hypothesis Only Baselines in Natural Language Inference , 2018, *SEMEVAL.

[19]  Myle Ott,et al.  On The Evaluation of Machine Translation SystemsTrained With Back-Translation , 2019, ACL.

[20]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[21]  Holger Schwenk,et al.  WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia , 2019, EACL.

[22]  Myle Ott,et al.  Understanding Back-Translation at Scale , 2018, EMNLP.

[23]  Zachary C. Lipton,et al.  How Much Reading Does Reading Comprehension Require? A Critical Investigation of Popular Benchmarks , 2018, EMNLP.

[24]  Ming Zhou,et al.  Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks , 2019, EMNLP.

[25]  Philipp Koehn,et al.  Findings of the 2013 Workshop on Statistical Machine Translation , 2013, WMT@ACL.

[26]  Myle Ott,et al.  Scaling Neural Machine Translation , 2018, WMT.

[27]  R. Thomas McCoy,et al.  Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference , 2019, ACL.

[28]  Richard Socher,et al.  XLDA: Cross-Lingual Data Augmentation for Natural Language Inference and Question Answering , 2019, ArXiv.

[29]  Percy Liang,et al.  Adversarial Examples for Evaluating Reading Comprehension Systems , 2017, EMNLP.

[30]  Marta R. Costa-jussà,et al.  Findings of the 2019 Conference on Machine Translation (WMT19) , 2019, WMT.

[31]  Antonio Toral,et al.  The Effect of Translationese in Machine Translation Test Sets , 2019, WMT.

[32]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[33]  Shuly Wintner,et al.  On the features of translationese , 2015, Digit. Scholarsh. Humanit..

[34]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[35]  C. J. McGrath,et al.  Effect of exchange rate return on volatility spill-over across trading regions , 2012 .

[36]  Veselin Stoyanov,et al.  Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[37]  Kenneth Heafield,et al.  N-gram Counts and Language Models from the Common Crawl , 2014, LREC.

[38]  Maria Leonor Pacheco,et al.  of the Association for Computational Linguistics: , 2001 .

[39]  Guillaume Lample,et al.  XNLI: Evaluating Cross-lingual Sentence Representations , 2018, EMNLP.

[40]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[41]  Jason Baldridge,et al.  PAWS-X: A Cross-lingual Adversarial Dataset for Paraphrase Identification , 2019, EMNLP.

[42]  Heng Ji,et al.  Cross-lingual Name Tagging and Linking for 282 Languages , 2017, ACL.

[43]  José A. R. Fonollosa,et al.  Automatic Spanish Translation of SQuAD Dataset for Multi-lingual Question Answering , 2019, LREC.

[44]  Timothy Baldwin,et al.  langid.py: An Off-the-shelf Language Identification Tool , 2012, ACL.

[45]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.