Code-Mixing on Sesame Street: Dawn of the Adversarial Polyglots

Multilingual models have demonstrated impressive cross-lingual transfer performance. However, test sets like XNLI are monolingual at the example level. In multilingual communities, it is common for polyglots to code-mix when conversing with each other. Inspired by this phenomenon, we present two strong black-box adversarial attacks (one word-level, one phrase-level) for multilingual models that push their ability to handle code-mixed sentences to the limit. The former uses bilingual dictionaries to propose perturbations and translations of the clean example for sense disambiguation. The latter directly aligns the clean example with its translations before extracting phrases as perturbations. Our phrase-level attack has a success rate of 89.75% against XLM-R-large, bringing its average accuracy of 79.85 down to 8.18 on XNLI. Finally, we propose an efficient adversarial training scheme that trains in the same number of steps as the original model and show that it creates more language-invariant representations, improving clean and robust accuracy in the absence of lexical overlap without degrading performance on the original examples.

[1]  C. Lee Giles,et al.  Screenomics: A Framework to Capture and Analyze Personal Life Experiences and the Ways that Technology Shapes Them , 2019, Hum. Comput. Interact..

[2]  Xu Tan,et al.  MASS: Masked Sequence to Sequence Pre-training for Language Generation , 2019, ICML.

[3]  Hinrich Schutze,et al.  SimAlign: High Quality Word Alignments without Parallel Training Data using Static and Contextualized Embeddings , 2020, EMNLP.

[4]  Mani B. Srivastava,et al.  Generating Natural Language Adversarial Examples , 2018, EMNLP.

[5]  Aleksander Madry,et al.  Robustness May Be at Odds with Accuracy , 2018, ICLR.

[6]  Min-Yen Kan,et al.  It's Morphin' Time! Combating Linguistic Discrimination with Inflectional Perturbations , 2020, ACL.

[7]  Jatin Sharma,et al.  “I am borrowing ya mixing ?" An Analysis of English-Hindi Code Mixing in Facebook , 2014, CodeSwitch@EMNLP.

[8]  Yaron Matras,et al.  Grammatical borrowing in cross-linguistic perspective , 2007 .

[9]  Mary W. J. Tay,et al.  Code switching and code mixing as a communicative strategy in multilingual discourse , 1989 .

[10]  Mark Leikin,et al.  The effect of bilingualism on creativity: Developmental and educational perspectives , 2013 .

[11]  Alan W Black,et al.  What Code-Switching Strategies are Effective in Dialog Systems? , 2020, SCIL.

[12]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[13]  Dejing Dou,et al.  HotFlip: White-Box Adversarial Examples for Text Classification , 2017, ACL.

[14]  David Singleton,et al.  Multilingualism as a New Linguistic Dispensation , 2008 .

[15]  Eva Schlinger,et al.  How Multilingual is Multilingual BERT? , 2019, ACL.

[16]  Siddhant Garg,et al.  BAE: BERT-based Adversarial Examples for Text Classification , 2020, EMNLP.

[17]  Zhen Yang,et al.  CSP: Code-Switching Pre-training for Neural Machine Translation , 2020, EMNLP.

[18]  Monojit Choudhury,et al.  Do Multilingual Users Prefer Chat-bots that Code-mix? Let's Nudge and Find Out! , 2020, Proc. ACM Hum. Comput. Interact..

[19]  Joachim Wagner,et al.  Code Mixing: A Challenge for Language Identification in the Language of Social Media , 2014, CodeSwitch@EMNLP.

[20]  Martin Popel,et al.  Transforming machine translation: a deep learning system reaches news translation quality comparable to human professionals , 2020, Nature Communications.

[21]  Monojit Choudhury,et al.  GLUECoS: An Evaluation Benchmark for Code-Switched NLP , 2020, ACL.

[22]  Quan Z. Sheng,et al.  Adversarial Attacks on Deep Learning Models in Natural Language Processing: A Survey , 2019 .

[23]  Claire Cardie,et al.  Adversarial Deep Averaging Networks for Cross-Lingual Sentiment Classification , 2016, TACL.

[24]  Veselin Stoyanov,et al.  Emerging Cross-lingual Structure in Pretrained Language Models , 2020, ACL.

[25]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[26]  S. Sridhar,et al.  The Syntax and Psycholinguistics of Bilingual Code Mixing. , 1980 .

[27]  Monojit Choudhury,et al.  Language Modeling for Code-Mixing: The Role of Linguistic Theory based Synthetic Data , 2018, ACL.

[28]  Peter Auer,et al.  Code-switching Between Structural and Sociolinguistic Perspectives , 2015 .

[29]  Sameer Singh,et al.  Universal Adversarial Triggers for Attacking and Analyzing NLP , 2019, EMNLP.

[30]  Peter Szolovits,et al.  Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment , 2020, AAAI.

[31]  Lei Li,et al.  Generating Fluent Adversarial Examples for Natural Languages , 2019, ACL.

[32]  D. Sankoff,et al.  The social correlates and linguistic processes of lexical borrowing and assimilation , 1988 .

[33]  Zhen Yang,et al.  CSP: Code-Switching Pre-training for Neural Machine Translation , 2020, EMNLP.

[34]  Guillaume Lample,et al.  Word Translation Without Parallel Data , 2017, ICLR.

[35]  Benoit Sagot,et al.  Can Multilingual Language Models Transfer to an Unseen Dialect? A Case Study on North African Arabizi , 2020, ArXiv.

[36]  Ivan Titov,et al.  Inducing Crosslingual Distributed Representations of Words , 2012, COLING.

[37]  Goran Glavaš,et al.  From Zero to Hero: On the Limitations of Zero-Shot Language Transfer with Multilingual Transformers , 2020, EMNLP.

[38]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[39]  Pascale Fung,et al.  Code-Switched Language Models Using Neural Based Synthetic Data from Parallel Sentences , 2019, CoNLL.

[40]  Kewei Tu,et al.  Adversarial Attack and Defense of Structured Prediction Models , 2020, EMNLP.

[41]  Stefan Steinerberger,et al.  Fast Interpolation-based t-SNE for Improved Visualization of Single-Cell RNA-Seq Data , 2017, Nature Methods.

[42]  Guillaume Lample,et al.  XNLI: Evaluating Cross-lingual Sentence Representations , 2018, EMNLP.

[43]  Dileep A. Divekar,et al.  DC Statistical Circuit Analysis for Bipolar IC's Using Parameter Correlations-An Experimental Example , 1984, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[44]  Julia Hirschberg,et al.  Overview for the First Shared Task on Language Identification in Code-Switched Data , 2014, CodeSwitch@EMNLP.

[45]  Preslav Nakov,et al.  Cross-language Learning with Adversarial Neural Networks , 2017, CoNLL.

[46]  Shafiq R. Joty,et al.  Mind Your Inflections! Improving NLP for Non-Standard Englishes with Base-Inflection Encoding , 2020, EMNLP.

[47]  Noah Constant,et al.  LAReQA: Language-agnostic Answer Retrieval from a Multilingual Pool , 2020, EMNLP.

[48]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[49]  Ghazi M. Abuhakema,et al.  Code switching and code mixing in Arabic written advertisements: Patterns, aspects, and the question of prestige and standardisation , 2013 .

[50]  Ngoc Thang Vu,et al.  Comparing Attention-Based Convolutional and Recurrent Neural Networks: Success and Limitations in Machine Reading Comprehension , 2018, CoNLL.

[51]  Josef van Genabith,et al.  Code-Mixed Question Answering Challenge: Crowd-sourcing Data and Techniques , 2018, CodeSwitch@ACL.

[52]  Jörg Tiedemann,et al.  OPUS-MT – Building open translation services for the World , 2020, EAMT.

[53]  Graham Neubig,et al.  On Evaluation of Adversarial Perturbations for Sequence-to-Sequence Models , 2019, NAACL.

[54]  Alan W. Black,et al.  A Survey of Code-switched Speech and Language Processing , 2019, ArXiv.

[55]  Holger Schwenk,et al.  Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond , 2018, Transactions of the Association for Computational Linguistics.

[56]  Mona Diab,et al.  Leveraging Pretrained Word Embeddings for Part-of-Speech Tagging of Code Switching Data , 2019, ArXiv.

[57]  Patricia A. Duff Transnationalism, Multilingualism, and Identity , 2015, Annual Review of Applied Linguistics.

[58]  Thamar Solorio,et al.  From English to Code-Switching: Transfer Learning with Strong Morphological Clues , 2020, ACL.

[59]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[60]  Carlos Guestrin,et al.  Semantically Equivalent Adversarial Rules for Debugging NLP models , 2018, ACL.

[61]  Blaž Zupan,et al.  openTSNE: a modular Python library for t-SNE dimensionality reduction and embedding , 2019, bioRxiv.

[62]  Ting Wang,et al.  TextBugger: Generating Adversarial Text Against Real-world Applications , 2018, NDSS.

[63]  Veselin Stoyanov,et al.  Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[64]  Braj B. Kachru,et al.  Toward Structuring Code-Mixing: An Indian Perspective , 1978 .

[65]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[66]  Carol Myers-Scotton,et al.  Duelling Languages: Grammatical Structure in Codeswitching , 1993 .

[67]  Shafiq R. Joty,et al.  MultiMix: A Robust Data Augmentation Framework for Cross-Lingual NLP , 2020 .

[68]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[69]  Dan Roth,et al.  Cross-Lingual Ability of Multilingual BERT: An Empirical Study , 2019, ICLR.

[70]  Tanmoy Chakraborty,et al.  SemEval-2020 Task 9: Overview of Sentiment Analysis of Code-Mixed Tweets , 2020, SEMEVAL.

[71]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[72]  Dan Garrette,et al.  Part-of-Speech Tagging for Code-Switched, Transliterated Texts without Explicit Language Identification , 2018, EMNLP.

[73]  Ming Zhou,et al.  Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks , 2019, EMNLP.

[74]  Zhiyuan Liu,et al.  Word-level Textual Adversarial Attacking as Combinatorial Optimization , 2019, ACL.

[75]  Fan Yang,et al.  XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation , 2020, EMNLP.

[76]  Graham Neubig,et al.  Should All Cross-Lingual Embeddings Speak English? , 2020, ACL.

[77]  Christo Kirov,et al.  Processing South Asian Languages Written in the Latin Script: the Dakshina Dataset , 2020, LREC.

[78]  Li Wei,et al.  The role of code-switching in bilingual creativity , 2015 .

[79]  Mark Dredze,et al.  Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT , 2019, EMNLP.

[80]  Ming-Wei Chang,et al.  Latent Retrieval for Weakly Supervised Open Domain Question Answering , 2019, ACL.

[81]  Wanxiang Che,et al.  Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency , 2019, ACL.

[82]  F. Grin,et al.  Multilingualism and creativity: a multivariate approach , 2018 .

[83]  Kofi Yakpo Code-switching and social change: Convergent language mixing in a multilingual society , 2015 .

[84]  Mikel Artetxe,et al.  On the Cross-lingual Transferability of Monolingual Representations , 2019, ACL.

[85]  Aravind K. Joshi,et al.  Processing of Sentences With Intra-Sentential Code-Switching , 1982, COLING.

[86]  M. Goral,et al.  Variation in language mixing in multilingual aphasia , 2019, Clinical linguistics & phonetics.

[87]  Miles Osborne,et al.  Statistical Machine Translation , 2010, Encyclopedia of Machine Learning and Data Mining.