论文信息 - Improving Zero-Shot Cross-Lingual Transfer Learning via Robust Training - 字舞流文

Improving Zero-Shot Cross-Lingual Transfer Learning via Robust Training

In recent years, pre-trained multilingual language models, such as multilingual BERT and XLM-R, exhibit good performance on zero-shot cross-lingual transfer learning. However, since their multilingual contextual embedding spaces for different languages are not perfectly aligned, the difference between representations of different languages might cause zero-shot cross-lingual transfer failed in some cases. In this work, we draw connections between those failed cases and adversarial examples. We then propose to use robust training methods to train a robust model that can tolerate some noise in input embeddings. We study two widely used robust training methods: adversarial training and randomized smoothing. The experimental results demonstrate that robust training can improve zero-shot cross-lingual transfer for text classification. The performance improvements become significant when the distance between the source language and the target language increases.

Nanyun Peng | Kai-Wei Chang | Kuan-Hao Huang | Wasi Uddin Ahmad | Kai-Wei Chang | Nanyun Peng | Kuan-Hao Huang

[1] Mikel Artetxe,et al. On the Cross-lingual Transferability of Monolingual Representations , 2019, ACL.

[2] Clare R. Voss,et al. Cross-lingual Structure Transfer for Relation and Event Extraction , 2019, EMNLP.

[3] Ivan Titov,et al. Cross-lingual Transfer of Semantic Role Labeling Models , 2013, ACL.

[4] Aleksander Madry,et al. Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[5] Patrick Littell,et al. URIEL and lang2vec: Representing languages as typological, geographical, and phylogenetic vectors , 2017, EACL.

[6] Kuan-Hao Huang,et al. Generating Syntactically Controlled Paraphrases without Using Annotated Parallel Pairs , 2021, EACL.

[7] Aditi Raghunathan,et al. Robust Encodings: A Framework for Combating Adversarial Typos , 2020, ACL.

[8] Peng Xu,et al. Attention-Informed Mixed-Language Training for Zero-shot Cross-lingual Task-oriented Dialogue Systems , 2019, AAAI.

[9] Goran Glavaš,et al. From Zero to Hero: On the Limitations of Zero-Shot Language Transfer with Multilingual Transformers , 2020, EMNLP.

[10] Sebastian Riedel,et al. MLQA: Evaluating Cross-lingual Extractive Question Answering , 2019, ACL.

[11] Myle Ott,et al. Larger-Scale Transformers for Multilingual Masked Language Modeling , 2021, REPL4NLP.

[12] Nanyun Peng,et al. GATE: Graph Attention Transformer Encoder for Cross-lingual Relation and Event Extraction , 2020, AAAI.

[13] Naveen Arivazhagan,et al. Language-agnostic BERT Sentence Embedding , 2020, ArXiv.

[14] Graham Neubig,et al. XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization , 2020, ICML.

[15] Xipeng Qiu,et al. BERT-ATTACK: Adversarial Attack against BERT Using BERT , 2020, EMNLP.

[16] Hong Liu,et al. Towards Robustness Against Natural Language Word Substitutions , 2021, ICLR.

[17] Xuanjing Huang,et al. Defense against Synonym Substitution-based Adversarial Attacks via Dirichlet Neighborhood Ensemble , 2021, ACL.

[18] Hinrich Schutze,et al. Identifying Necessary Elements for BERT's Multilinguality , 2020, ArXiv.

[19] Kai-Wei Chang,et al. Syntax-augmented Multilingual BERT for Cross-lingual Transfer , 2021, ACL.

[20] Wanxiang Che,et al. Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency , 2019, ACL.

[21] J. Zico Kolter,et al. Certified Adversarial Robustness via Randomized Smoothing , 2019, ICML.

[22] Fan Yang,et al. XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation , 2020, EMNLP.

[23] Qiang Liu,et al. SAFER: A Structure-free Approach for Certified Robustness to Adversarial Word Substitutions , 2020, ACL.

[24] Siddhant Garg,et al. BAE: BERT-based Adversarial Examples for Text Classification , 2020, EMNLP.

[25] Mona T. Diab,et al. Context-Aware Cross-Lingual Mapping , 2019, NAACL.

[26] Quoc V. Le,et al. Exploiting Similarities among Languages for Machine Translation , 2013, ArXiv.

[27] Dejing Dou,et al. HotFlip: White-Box Adversarial Examples for Text Classification , 2017, ACL.

[28] Eunsol Choi,et al. TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages , 2020, Transactions of the Association for Computational Linguistics.

[29] Guillaume Lample,et al. XNLI: Evaluating Cross-lingual Sentence Representations , 2018, EMNLP.

[30] Samuel L. Smith,et al. Offline bilingual word vectors, orthogonal transformations and the inverted softmax , 2017, ICLR.

[31] Guillaume Lample,et al. Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[32] Ming Zhou,et al. InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training , 2021, NAACL.

[33] Po-Sen Huang,et al. Achieving Verified Robustness to Symbol Substitutions via Interval Bound Propagation , 2019, EMNLP/IJCNLP.

[34] Dan Roth,et al. Cross-Lingual Ability of Multilingual BERT: An Empirical Study , 2019, ICLR.

[35] Graham Neubig,et al. Word Alignment by Fine-tuning Embeddings on Parallel Corpora , 2021, EACL.

[36] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[37] Yubo Chen,et al. Neural Cross-Lingual Event Detection with Minimal Parallel Resources , 2019, EMNLP.

[38] Jonathan Berant,et al. White-to-Black: Efficient Distillation of Black-Box Adversarial Attacks , 2019, NAACL.

[39] Luke S. Zettlemoyer,et al. Adversarial Example Generation with Syntactically Controlled Paraphrase Networks , 2018, NAACL.

[40] Veselin Stoyanov,et al. Emerging Cross-lingual Structure in Pretrained Language Models , 2020, ACL.

[41] Saloni Potdar,et al. Multilingual BERT Post-Pretraining Alignment , 2020, NAACL.

[42] Nanyun Peng,et al. Cross-Lingual Dependency Parsing with Unlabeled Auxiliary Languages , 2019, CoNLL.

[43] Cho-Jui Hsieh,et al. Robustness Verification for Transformers , 2020, ICLR.

[44] Weihua Luo,et al. On Learning Universal Representations Across Languages , 2021, ICLR.

[45] Mani B. Srivastava,et al. Generating Natural Language Adversarial Examples , 2018, EMNLP.

[46] Nanyun Peng,et al. On Difficulties of Cross-Lingual Transfer with Order Differences: A Case Study on Dependency Parsing , 2018, NAACL.

[47] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[48] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[49] Isabelle Augenstein,et al. Zero-Shot Cross-Lingual Transfer with Meta Learning , 2020, EMNLP.

[50] Luo Si,et al. Syntax-Enhanced Self-Attention-Based Semantic Role Labeling , 2019, EMNLP/IJCNLP.

[51] Eneko Agirre,et al. Learning bilingual word embeddings with (almost) no bilingual data , 2017, ACL.

[52] Veselin Stoyanov,et al. Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[53] Jason Baldridge,et al. PAWS-X: A Cross-lingual Adversarial Dataset for Paraphrase Identification , 2019, EMNLP.

[54] Aditi Raghunathan,et al. Certified Robustness to Adversarial Word Substitutions , 2019, EMNLP.

[55] Fabrizio Silvestri,et al. Misspelling Oblivious Word Embeddings , 2019, NAACL.

[56] Nanyun Peng,et al. Target Language-Aware Constrained Inference for Cross-lingual Dependency Parsing , 2019, EMNLP.

[57] Dan Klein,et al. Multilingual Alignment of Contextual Word Representations , 2020, ICLR.

[58] Peter Szolovits,et al. Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment , 2020, AAAI.

[59] Regina Barzilay,et al. Cross-Lingual Alignment of Contextual Word Embeddings, with Applications to Zero-shot Dependency Parsing , 2019, NAACL.

[60] Libo Qin,et al. CoSDA-ML: Multi-Lingual Code-Switching Data Augmentation for Zero-Shot Cross-Lingual NLP , 2020, ArXiv.

[61] Noah A. Smith,et al. A Simple, Fast, and Effective Reparameterization of IBM Model 2 , 2013, NAACL.

[62] David Vandyke,et al. Counter-fitting Word Vectors to Linguistic Constraints , 2016, NAACL.