Improving Zero-Shot Cross-Lingual Transfer Learning via Robust Training

In recent years, pre-trained multilingual language models, such as multilingual BERT and XLM-R, exhibit good performance on zero-shot cross-lingual transfer learning. However, since their multilingual contextual embedding spaces for different languages are not perfectly aligned, the difference between representations of different languages might cause zero-shot cross-lingual transfer failed in some cases. In this work, we draw connections between those failed cases and adversarial examples. We then propose to use robust training methods to train a robust model that can tolerate some noise in input embeddings. We study two widely used robust training methods: adversarial training and randomized smoothing. The experimental results demonstrate that robust training can improve zero-shot cross-lingual transfer for text classification. The performance improvements become significant when the distance between the source language and the target language increases.

[1]  Mikel Artetxe,et al.  On the Cross-lingual Transferability of Monolingual Representations , 2019, ACL.

[2]  Clare R. Voss,et al.  Cross-lingual Structure Transfer for Relation and Event Extraction , 2019, EMNLP.

[3]  Ivan Titov,et al.  Cross-lingual Transfer of Semantic Role Labeling Models , 2013, ACL.

[4]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[5]  Patrick Littell,et al.  URIEL and lang2vec: Representing languages as typological, geographical, and phylogenetic vectors , 2017, EACL.

[6]  Kuan-Hao Huang,et al.  Generating Syntactically Controlled Paraphrases without Using Annotated Parallel Pairs , 2021, EACL.

[7]  Aditi Raghunathan,et al.  Robust Encodings: A Framework for Combating Adversarial Typos , 2020, ACL.

[8]  Peng Xu,et al.  Attention-Informed Mixed-Language Training for Zero-shot Cross-lingual Task-oriented Dialogue Systems , 2019, AAAI.

[9]  Goran Glavaš,et al.  From Zero to Hero: On the Limitations of Zero-Shot Language Transfer with Multilingual Transformers , 2020, EMNLP.

[10]  Sebastian Riedel,et al.  MLQA: Evaluating Cross-lingual Extractive Question Answering , 2019, ACL.

[11]  Myle Ott,et al.  Larger-Scale Transformers for Multilingual Masked Language Modeling , 2021, REPL4NLP.

[12]  Nanyun Peng,et al.  GATE: Graph Attention Transformer Encoder for Cross-lingual Relation and Event Extraction , 2020, AAAI.

[13]  Naveen Arivazhagan,et al.  Language-agnostic BERT Sentence Embedding , 2020, ArXiv.

[14]  Graham Neubig,et al.  XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization , 2020, ICML.

[15]  Xipeng Qiu,et al.  BERT-ATTACK: Adversarial Attack against BERT Using BERT , 2020, EMNLP.

[16]  Hong Liu,et al.  Towards Robustness Against Natural Language Word Substitutions , 2021, ICLR.

[17]  Xuanjing Huang,et al.  Defense against Synonym Substitution-based Adversarial Attacks via Dirichlet Neighborhood Ensemble , 2021, ACL.

[18]  Hinrich Schutze,et al.  Identifying Necessary Elements for BERT's Multilinguality , 2020, ArXiv.

[19]  Kai-Wei Chang,et al.  Syntax-augmented Multilingual BERT for Cross-lingual Transfer , 2021, ACL.

[20]  Wanxiang Che,et al.  Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency , 2019, ACL.

[21]  J. Zico Kolter,et al.  Certified Adversarial Robustness via Randomized Smoothing , 2019, ICML.

[22]  Fan Yang,et al.  XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation , 2020, EMNLP.

[23]  Qiang Liu,et al.  SAFER: A Structure-free Approach for Certified Robustness to Adversarial Word Substitutions , 2020, ACL.

[24]  Siddhant Garg,et al.  BAE: BERT-based Adversarial Examples for Text Classification , 2020, EMNLP.

[25]  Mona T. Diab,et al.  Context-Aware Cross-Lingual Mapping , 2019, NAACL.

[26]  Quoc V. Le,et al.  Exploiting Similarities among Languages for Machine Translation , 2013, ArXiv.

[27]  Dejing Dou,et al.  HotFlip: White-Box Adversarial Examples for Text Classification , 2017, ACL.

[28]  Eunsol Choi,et al.  TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages , 2020, Transactions of the Association for Computational Linguistics.

[29]  Guillaume Lample,et al.  XNLI: Evaluating Cross-lingual Sentence Representations , 2018, EMNLP.

[30]  Samuel L. Smith,et al.  Offline bilingual word vectors, orthogonal transformations and the inverted softmax , 2017, ICLR.

[31]  Guillaume Lample,et al.  Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[32]  Ming Zhou,et al.  InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training , 2021, NAACL.

[33]  Po-Sen Huang,et al.  Achieving Verified Robustness to Symbol Substitutions via Interval Bound Propagation , 2019, EMNLP/IJCNLP.

[34]  Dan Roth,et al.  Cross-Lingual Ability of Multilingual BERT: An Empirical Study , 2019, ICLR.

[35]  Graham Neubig,et al.  Word Alignment by Fine-tuning Embeddings on Parallel Corpora , 2021, EACL.

[36]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[37]  Yubo Chen,et al.  Neural Cross-Lingual Event Detection with Minimal Parallel Resources , 2019, EMNLP.

[38]  Jonathan Berant,et al.  White-to-Black: Efficient Distillation of Black-Box Adversarial Attacks , 2019, NAACL.

[39]  Luke S. Zettlemoyer,et al.  Adversarial Example Generation with Syntactically Controlled Paraphrase Networks , 2018, NAACL.

[40]  Veselin Stoyanov,et al.  Emerging Cross-lingual Structure in Pretrained Language Models , 2020, ACL.

[41]  Saloni Potdar,et al.  Multilingual BERT Post-Pretraining Alignment , 2020, NAACL.

[42]  Nanyun Peng,et al.  Cross-Lingual Dependency Parsing with Unlabeled Auxiliary Languages , 2019, CoNLL.

[43]  Cho-Jui Hsieh,et al.  Robustness Verification for Transformers , 2020, ICLR.

[44]  Weihua Luo,et al.  On Learning Universal Representations Across Languages , 2021, ICLR.

[45]  Mani B. Srivastava,et al.  Generating Natural Language Adversarial Examples , 2018, EMNLP.

[46]  Nanyun Peng,et al.  On Difficulties of Cross-Lingual Transfer with Order Differences: A Case Study on Dependency Parsing , 2018, NAACL.

[47]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[48]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[49]  Isabelle Augenstein,et al.  Zero-Shot Cross-Lingual Transfer with Meta Learning , 2020, EMNLP.

[50]  Luo Si,et al.  Syntax-Enhanced Self-Attention-Based Semantic Role Labeling , 2019, EMNLP/IJCNLP.

[51]  Eneko Agirre,et al.  Learning bilingual word embeddings with (almost) no bilingual data , 2017, ACL.

[52]  Veselin Stoyanov,et al.  Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[53]  Jason Baldridge,et al.  PAWS-X: A Cross-lingual Adversarial Dataset for Paraphrase Identification , 2019, EMNLP.

[54]  Aditi Raghunathan,et al.  Certified Robustness to Adversarial Word Substitutions , 2019, EMNLP.

[55]  Fabrizio Silvestri,et al.  Misspelling Oblivious Word Embeddings , 2019, NAACL.

[56]  Nanyun Peng,et al.  Target Language-Aware Constrained Inference for Cross-lingual Dependency Parsing , 2019, EMNLP.

[57]  Dan Klein,et al.  Multilingual Alignment of Contextual Word Representations , 2020, ICLR.

[58]  Peter Szolovits,et al.  Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment , 2020, AAAI.

[59]  Regina Barzilay,et al.  Cross-Lingual Alignment of Contextual Word Embeddings, with Applications to Zero-shot Dependency Parsing , 2019, NAACL.

[60]  Libo Qin,et al.  CoSDA-ML: Multi-Lingual Code-Switching Data Augmentation for Zero-Shot Cross-Lingual NLP , 2020, ArXiv.

[61]  Noah A. Smith,et al.  A Simple, Fast, and Effective Reparameterization of IBM Model 2 , 2013, NAACL.

[62]  David Vandyke,et al.  Counter-fitting Word Vectors to Linguistic Constraints , 2016, NAACL.