TransFool: An Adversarial Attack against Neural Machine Translation Models

Deep neural networks have been shown to be vulnerable to small perturbations of their inputs, known as adversarial attacks. In this paper, we investigate the vulnerability of Neural Machine Translation (NMT) models to adversarial attacks and propose a new attack algorithm called TransFool. To fool NMT models, TransFool builds on a multi-term optimization problem and a gradient projection step. By integrating the embedding representation of a language model, we generate fluent adversarial examples in the source language that maintain a high level of semantic similarity with the clean samples. Experimental results demonstrate that, for different translation tasks and NMT architectures, our white-box attack can severely degrade the translation quality while the semantic similarity between the original and the adversarial sentences stays high. Moreover, we show that TransFool is transferable to unknown target models. Finally, based on automatic and human evaluations, TransFool leads to improvement in terms of success rate, semantic similarity, and fluency compared to the existing attacks both in white-box and black-box settings. Thus, TransFool permits us to better characterize the vulnerability of NMT models and outlines the necessity to design strong defense mechanisms and more robust NMT systems for real-life applications.

[1]  M. Utiyama,et al.  Ignorance is Bliss: Exploring Defenses Against Invariance-Based Attacks on Neural Machine Translation Systems , 2022, IEEE Transactions on Artificial Intelligence.

[2]  Bo Li,et al.  SemAttack: Natural Textual Attacks via Different Semantic Spaces , 2022, NAACL-HLT.

[3]  P. Frossard,et al.  Block-Sparse Adversarial Attack to Fool Transformer-Based Text Classifiers , 2022, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Huda Khayrallah,et al.  Doubly-Trained Adversarial Data Augmentation for Neural Machine Translation , 2021, AMTA.

[5]  Alexander M. Rush,et al.  Datasets: A Community Library for Natural Language Processing , 2021, EMNLP.

[6]  Md Hafizul Islam Chowdhuryy,et al.  Seeds of SEED: NMT-Stroke: Diverting Neural Machine Translation through Hardware-based Faults , 2021, 2021 International Symposium on Secure and Private Execution Environment Design (SEED).

[7]  Benjamin I. P. Rubinstein,et al.  Putting words into the system’s mouth: A targeted attack on neural machine translation using monolingual data poisoning , 2021, FINDINGS.

[8]  Douwe Kiela,et al.  Gradient-based Adversarial Attacks against Text Transformers , 2021, EMNLP.

[9]  Hassan S. Shavarani,et al.  Better Neural Machine Translation by Extracting Linguistic Information from BERT , 2021, EACL.

[10]  Francisco Guzmán,et al.  A Targeted Attack on Black-Box Neural Machine Translation with Parallel Data Poisoning , 2020, WWW.

[11]  Seyed-Mohsen Moosavi-Dezfooli,et al.  Optimism in the Face of Adversity: Understanding and Improving Deep Learning Through Adversarial Robustness , 2020, Proceedings of the IEEE.

[12]  Ankur P. Parikh,et al.  Learning to Evaluate Translation Beyond English: BLEURT Submissions to the WMT Metrics 2020 Shared Task , 2020, WMT.

[13]  Rico Sennrich,et al.  Detecting Word Sense Disambiguation Biases in Machine Translation for Model-Agnostic Adversarial Attacks , 2020, EMNLP.

[14]  Yuqing Tang,et al.  Multilingual Translation with Extensible Multilingual Pretraining and Finetuning , 2020, ArXiv.

[15]  Jacob Eisenstein,et al.  AdvAug: Robust Adversarial Augmentation for Neural Machine Translation , 2020, ACL.

[16]  Minako O'Hagan,et al.  Understanding the societal impacts of machine translation: a critical review of the literature on medical and legal use cases , 2020, Information, Communication & Society.

[17]  D. Song,et al.  Imitation Attacks and Defenses for Black-box Machine Translation Systems , 2020, EMNLP.

[18]  Rico Sennrich,et al.  Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation , 2020, ACL.

[19]  John X. Morris,et al.  Reevaluating Adversarial Examples in Natural Language , 2020, FINDINGS.

[20]  Quan Z. Sheng,et al.  Adversarial Attacks on Deep-learning Models in Natural Language Processing , 2020, ACM Trans. Intell. Syst. Technol..

[21]  Xipeng Qiu,et al.  BERT-ATTACK: Adversarial Attack against BERT Using BERT , 2020, EMNLP.

[22]  Jaewoo Kang,et al.  Adversarial Subword Regularization for Robust Neural Machine Translation , 2020, FINDINGS.

[23]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[24]  Meng Zhang,et al.  Textual Adversarial Attack as Combinatorial Optimization , 2019, 1910.12196.

[25]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[26]  Akshay Chaturvedi,et al.  Exploring the Robustness of NMT Systems to Nonsensical Inputs , 2019, 1908.01165.

[27]  Joey Tianyi Zhou,et al.  Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment , 2019, AAAI.

[28]  Myle Ott,et al.  Facebook FAIR’s WMT19 News Translation Task Submission , 2019, WMT.

[29]  Ray Kurzweil,et al.  Multilingual Universal Sentence Encoder for Semantic Retrieval , 2019, ACL.

[30]  Wanxiang Che,et al.  Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency , 2019, ACL.

[31]  Yong Cheng,et al.  Robust Neural Machine Translation with Doubly Adversarial Inputs , 2019, ACL.

[32]  Alex Wang,et al.  What do you learn from context? Probing for sentence structure in contextualized word representations , 2019, ICLR.

[33]  Kilian Q. Weinberger,et al.  BERTScore: Evaluating Text Generation with BERT , 2019, ICLR.

[34]  Graham Neubig,et al.  On Evaluation of Adversarial Perturbations for Sequence-to-Sequence Models , 2019, NAACL.

[35]  Ondrej Bojar,et al.  Results of the WMT18 Metrics Shared Task: Both characters and embeddings achieve good performance , 2018, WMT.

[36]  Dejing Dou,et al.  On Adversarial Examples for Character-Level Neural Machine Translation , 2018, COLING.

[37]  Yang Liu,et al.  Towards Robust Neural Machine Translation , 2018, ACL.

[38]  Matt Post,et al.  A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[39]  Mani B. Srivastava,et al.  Generating Natural Language Adversarial Examples , 2018, EMNLP.

[40]  André F. T. Martins,et al.  Marian: Fast Neural Machine Translation in C++ , 2018, ACL.

[41]  Jinfeng Yi,et al.  Seq2Sick: Evaluating the Robustness of Sequence-to-Sequence Models with Adversarial Examples , 2018, AAAI.

[42]  Dejing Dou,et al.  HotFlip: White-Box Adversarial Examples for Text Classification , 2017, ACL.

[43]  Yonatan Belinkov,et al.  Synthetic and Natural Noise Both Break Neural Machine Translation , 2017, ICLR.

[44]  Eneko Agirre,et al.  SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation , 2017, *SEMEVAL.

[45]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[46]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[47]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Seyed-Mohsen Moosavi-Dezfooli,et al.  DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Timothy Baldwin,et al.  Can machine translation systems be evaluated by the crowd alone , 2015, Natural Language Engineering.

[50]  Maja Popovic,et al.  chrF: character n-gram F-score for automatic MT evaluation , 2015, WMT@EMNLP.

[51]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[52]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[53]  Philipp Koehn,et al.  Findings of the 2014 Workshop on Statistical Machine Translation , 2014, WMT@ACL.

[54]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[55]  Timothy Baldwin,et al.  Continuous Measurement Scales in Human Evaluation of Machine Translation , 2013, LAW@ACL.

[56]  Jörg Tiedemann,et al.  Parallel Data, Tools and Interfaces in OPUS , 2012, LREC.

[57]  Xinze Zhang,et al.  Crafting Adversarial Examples for Neural Machine Translation , 2021, ACL.

[58]  Heng Yu,et al.  A2R2: Robust Unsupervised Neural Machine Translation With Adversarial Attack and Regularization on Representations , 2021, IEEE Access.

[59]  A. Lavie,et al.  Results of the WMT21 Metrics Shared Task: Evaluating Metrics with Expert-based Human Evaluations on TED and News Domain , 2021, WMT.

[60]  M. Paccot,et al.  findings of a , 2021 .

[61]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .