Bad Characters: Imperceptible NLP Attacks

Several years of research have shown that machinelearning systems are vulnerable to adversarial examples, both in theory and in practice. Until now, such attacks have primarily targeted visual models, exploiting the gap between human and machine perception. Although text-based models have also been attacked with adversarial examples, such attacks struggled to preserve semantic meaning and indistinguishability. In this paper, we explore a large class of adversarial examples that can be used to attack text-based models in a black-box setting without making any human-perceptible visual modification to inputs. We use encoding-specific perturbations that are imperceptible to the human eye to manipulate the outputs of a wide range of Natural Language Processing (NLP) systems from neural machine-translation pipelines to web search engines. We find that with a single imperceptible encoding injection – representing one invisible character, homoglyph, reordering, or deletion – an attacker can significantly reduce the performance of vulnerable models, and with three injections most models can be functionally broken. Our attacks work against currently-deployed commercial systems, including those produced by Microsoft and Google, in addition to open source models published by Facebook, IBM, and HuggingFace. This novel series of attacks presents a significant threat to many language processing systems: an attacker can affect systems in a targeted manner without any assumptions about the underlying model. We conclude that text-based NLP systems require careful input sanitization, just like conventional applications, and that given such systems are now being deployed rapidly at scale, the urgent attention of architects and operators is required.

[1]  Fabio Roli,et al.  Evasion Attacks against Machine Learning at Test Time , 2013, ECML/PKDD.

[2]  Steven D. Gribble,et al.  Cutting through the Confusion: A Measurement Study of Homograph Attacks , 2006, USENIX Annual Technical Conference, General Track.

[3]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[4]  Tudor Dumitras,et al.  Terminal Brain Damage: Exposing the Graceless Degradation in Deep Neural Networks Under Hardware Fault Attacks , 2019, USENIX Security Symposium.

[5]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[6]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[7]  Ananthram Swami,et al.  Crafting adversarial input sequences for recurrent neural networks , 2016, MILCOM 2016 - 2016 IEEE Military Communications Conference.

[8]  Wanxiang Che,et al.  Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency , 2019, ACL.

[9]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[10]  Ting Wang,et al.  TextBugger: Generating Adversarial Text Against Real-world Applications , 2019, NDSS.

[11]  David A. Smith,et al.  Detecting and modeling local text reuse , 2014, IEEE/ACM Joint Conference on Digital Libraries.

[12]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[13]  Luke S. Zettlemoyer,et al.  Adversarial Example Generation with Syntactically Controlled Paraphrase Networks , 2018, NAACL.

[14]  Fabio Roli,et al.  Wild patterns: Ten years after the rise of adversarial machine learning , 2018, Pattern Recognit..

[15]  Sameer Singh,et al.  Generating Natural Adversarial Examples , 2018, ICLR.

[16]  Aditi Raghunathan,et al.  Certified Robustness to Adversarial Word Substitutions , 2019, EMNLP/IJCNLP.

[17]  Robert Mullins,et al.  Sponge Examples: Energy-Latency Attacks on Neural Networks , 2021, 2021 IEEE European Symposium on Security and Privacy (EuroS&P).

[18]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[19]  유재인 Chromium , 1944, Science.

[20]  Dejing Dou,et al.  HotFlip: White-Box Adversarial Examples for Text Classification , 2018, ACL.

[21]  Pamela W. Jordan,et al.  A survey of current paradigms in machine translation , 1999, Adv. Comput..

[22]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[23]  Fabio Roli,et al.  Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning , 2018, CCS.

[24]  R. Darnell Translation , 1873, The Indian medical gazette.

[25]  Elham Tabassi,et al.  A Taxonomy and Terminology of Adversarial Machine Learning , 2019 .

[26]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[27]  Michael Wiegand,et al.  A Survey on Hate Speech Detection using Natural Language Processing , 2017, SocialNLP@EACL.

[28]  Pillow , 2020, Definitions.

[29]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[30]  Graham Neubig,et al.  On Evaluation of Adversarial Perturbations for Sequence-to-Sequence Models , 2019, NAACL-HLT.

[31]  Wulczyn Ellery,et al.  Wikipedia Talk Labels: Toxicity , 2017 .

[32]  Yonatan Belinkov,et al.  Synthetic and Natural Noise Both Break Neural Machine Translation , 2017, ICLR.

[33]  B. Ideograms The Unicode Standard, Version 12.0 , 2017 .

[34]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[35]  W. N. Locke,et al.  Machine Translation of Languages: Fourteen Essays , 1955 .

[36]  R. Smith An Overview of the Tesseract OCR Engine , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[37]  Phil Blunsom,et al.  Recurrent Continuous Translation Models , 2013, EMNLP.

[38]  Evgeniy Gabrilovich,et al.  The homograph attack , 2002, CACM.

[39]  Jinfeng Yi,et al.  ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models , 2017, AISec@CCS.

[40]  Nicolas Papernot,et al.  Label-Only Membership Inference Attacks , 2020, ArXiv.

[41]  Xinyu Dai,et al.  A Reinforced Generation of Adversarial Examples for Neural Machine Translation , 2020, ACL.

[42]  Thomas Wolf,et al.  DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[43]  Bohn Stafleu van Loghum Google translate , 2017 .

[44]  Blaine Nelson,et al.  Exploiting Machine Learning to Subvert Your Spam Filter , 2008, LEET.

[45]  Rainer Storn,et al.  Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces , 1997, J. Glob. Optim..

[46]  Lysandre Debut,et al.  Transformers: State-of-the-Art Natural Language Processing , 2020, EMNLP.

[47]  Shigeki Goto,et al.  ShamFinder: An Automated Framework for Detecting IDN Homographs , 2019, Internet Measurement Conference.

[48]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[49]  Myle Ott,et al.  Scaling Neural Machine Translation , 2018, WMT.

[50]  Mani B. Srivastava,et al.  Generating Natural Language Adversarial Examples , 2018, EMNLP.

[51]  John X. Morris,et al.  TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP , 2020, EMNLP.

[52]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[53]  Kevin Gimpel,et al.  Learning Paraphrastic Sentence Embeddings from Back-Translated Bitext , 2017, EMNLP.

[54]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[55]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[56]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[57]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[58]  T. Moore,et al.  Ten years of attacks on companies using visual impersonation of domain names , 2020, 2020 APWG Symposium on Electronic Crime Research (eCrime).

[59]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[60]  Michael P. Wellman,et al.  Towards the Science of Security and Privacy in Machine Learning , 2016, ArXiv.

[61]  KokSheik Wong,et al.  UniSpaCh: A text-based data hiding method using Unicode space characters , 2012, J. Syst. Softw..

[62]  Chang Liu,et al.  Manipulating Machine Learning: Poisoning Attacks and Countermeasures for Regression Learning , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[63]  Yanjun Qi,et al.  Black-Box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers , 2018, 2018 IEEE Security and Privacy Workshops (SPW).

[64]  Myle Ott,et al.  fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[65]  Yi-Shin Chen,et al.  CARER: Contextualized Affect Representations for Emotion Recognition , 2018, EMNLP.