Fall of Giants: How popular text-based MLaaS fall against a simple evasion attack

The increased demand for machine learning applications made companies offer Machine-Learning-as-a-Service (MLaaS). In MLaaS (a market estimated 8000M USD by 2025), users pay for well-performing ML models without dealing with the complicated training procedure. Among MLaaS, text-based applications are the most popular ones (e.g., language translators). Given this popularity, MLaaS must provide resiliency to adversarial manipulations. For example, a wrong translation might lead to a misunderstanding between two parties. In the text domain, state-of-the-art attacks mainly focus on strategies that leverage ML models' weaknesses. Unfortunately, not much attention has been given to the other pipeline' stages, such as the indexing stage (i.e., when a sentence is converted from a textual to a numerical representation) that, if manipulated, can significantly affect the final performance of the application. In this paper, we propose a novel text evasion technique called “Zero-Width attack” (ZeW) that leverages the injection of human non-readable characters, affecting indexing stage mechanisms. We demonstrate that our simple yet effective attack deceives MLaaS of “giants” such as Amazon, Google, IBM, and Microsoft. Our case study, based on the manipulation of hateful tweets, shows that out of 12 analyzed services, only one is resistant to our injection strategy. We finally introduce and test a simple input validation defense that can prevent our proposed attack.

[1]  S. Niwattanakul,et al.  Using of Jaccard Coefficient for Keywords Similarity , 2022 .

[2]  Harith Alani,et al.  Semantic Sentiment Analysis of Twitter , 2012, SEMWEB.

[3]  Quan Z. Sheng,et al.  Adversarial Attacks on Deep Learning Models in Natural Language Processing: A Survey , 2019 .

[4]  Keith W. Miller,et al.  Why we should have seen that coming: comments on Microsoft's tay "experiment," and wider implications , 2017, CSOC.

[5]  Yu Chen,et al.  Seeing is Not Believing: Camouflage Attacks on Image Scaling Algorithms , 2019, USENIX Security Symposium.

[6]  Mauro Conti,et al.  All You Need is "Love": Evading Hate Speech Detection , 2018, ArXiv.

[7]  Mani B. Srivastava,et al.  Generating Natural Language Adversarial Examples , 2018, EMNLP.

[8]  Brendan Dolan-Gavitt,et al.  BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain , 2017, ArXiv.

[9]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[10]  Patrick D. McDaniel,et al.  Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples , 2016, ArXiv.

[11]  Jon Andoni Duñabeitia,et al.  R34D1NG W0RD5 W1TH NUMB3R5. , 2008, Journal of experimental psychology. Human perception and performance.

[12]  Fabio Roli,et al.  Evasion Attacks against Machine Learning at Test Time , 2013, ECML/PKDD.

[13]  Zhendong Su,et al.  The essence of command injection attacks in web applications , 2006, POPL '06.

[14]  J. Doug Tygar,et al.  Adversarial machine learning , 2019, AISec '11.

[15]  Jinfeng Yi,et al.  Seq2Sick: Evaluating the Robustness of Sequence-to-Sequence Models with Adversarial Examples , 2018, AAAI.

[16]  Owen Rambow,et al.  Sentiment Analysis of Twitter Data , 2011 .

[17]  Radha Poovendran,et al.  Deceiving Google's Perspective API Built for Detecting Toxic Comments , 2017, ArXiv.

[18]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[19]  Patrick Paroubek,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2010, LREC.

[20]  Ting Wang,et al.  TextBugger: Generating Adversarial Text Against Real-world Applications , 2018, NDSS.

[21]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[22]  Etienne Perot,et al.  Deep Reinforcement Learning framework for Autonomous Driving , 2017, Autonomous Vehicles and Machines.

[23]  Krista Bennett,et al.  LINGUISTIC STEGANOGRAPHY: SURVEY, ANALYSIS, AND ROBUSTNESS CONCERNS FOR HIDING INFORMATION IN TEXT , 2004 .

[24]  Bo Li,et al.  Adversarial Texts with Gradient Methods , 2018, ArXiv.

[25]  Matt J. Kusner,et al.  From Word Embeddings To Document Distances , 2015, ICML.

[26]  Sameep Mehta,et al.  Towards Crafting Text Adversarial Samples , 2017, ArXiv.

[27]  Ingmar Weber,et al.  Automated Hate Speech Detection and the Problem of Offensive Language , 2017, ICWSM.

[28]  Michael Wiegand,et al.  A Survey on Hate Speech Detection using Natural Language Processing , 2017, SocialNLP@EACL.

[29]  Yong Cheng,et al.  Robust Neural Machine Translation with Doubly Adversarial Inputs , 2019, ACL.

[30]  Blaine Nelson,et al.  Poisoning Attacks against Support Vector Machines , 2012, ICML.

[31]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[32]  Eric Gilbert,et al.  VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text , 2014, ICWSM.

[33]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[34]  Di Tang,et al.  Stealthy Porn: Understanding Real-World Adversarial Images for Illicit Online Promotion , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[35]  Philipp Koehn,et al.  Synthesis Lectures on Human Language Technologies , 2016 .

[36]  Luigi V. Mancini,et al.  Boten ELISA: A novel approach for botnet C&C in Online Social Networks , 2015, 2015 IEEE Conference on Communications and Network Security (CNS).

[37]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[38]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[39]  V Korde,et al.  TEXT CLASSIFICATION AND CLASSIFIERS: A SURVEY , 2012 .

[40]  Rui Liu,et al.  Using Twitter to Predict When Vulnerabilities will be Exploited , 2019, KDD.

[41]  Navneet Kaur,et al.  Opinion mining and sentiment analysis , 2016, 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom).

[42]  Wenhu Chen,et al.  How Large a Vocabulary Does Text Classification Need? A Variational Approach to Vocabulary Selection , 2019, NAACL.

[43]  Dirk Hovy,et al.  Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter , 2016, NAACL.

[44]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[45]  Quoc V. Le,et al.  Addressing the Rare Word Problem in Neural Machine Translation , 2014, ACL.

[46]  Seyed-Mohsen Moosavi-Dezfooli,et al.  DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Fan Yang,et al.  An Embarrassingly Simple Approach for Trojan Attack in Deep Neural Networks , 2020, KDD.

[48]  Bhavani M. Thuraisingham,et al.  Web Data Mining and Applications in Business Intelligence and Counter-Terrorism , 2003 .

[49]  Kang Li,et al.  Security Risks in Deep Learning Implementations , 2017, 2018 IEEE Security and Privacy Workshops (SPW).

[50]  Christopher Joseph Pal,et al.  Brain tumor segmentation with Deep Neural Networks , 2015, Medical Image Anal..

[51]  Pedro M. Domingos,et al.  Adversarial classification , 2004, KDD.

[52]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[53]  Huanying Gu,et al.  Acoustic and Visual Approaches to Adversarial Text Generation for Google Perspective , 2019, 2019 International Conference on Computational Science and Computational Intelligence (CSCI).

[54]  Yanjun Qi,et al.  Black-Box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers , 2018, 2018 IEEE Security and Privacy Workshops (SPW).