BERT Rankers are Brittle: A Study using Adversarial Document Perturbations

Contextual ranking models based on BERT are now well established for a wide range of passage and document ranking tasks. However, the robustness of BERT-based ranking models under adversarial inputs is under-explored. In this paper, we argue that BERT-rankers are not immune to adversarial attacks targeting retrieved documents given a query. Firstly, we propose algorithms for adversarial perturbation of both highly relevant and non-relevant documents using gradient-based optimization methods. The aim of our algorithms is to add/replace a small number of tokens to a highly relevant or non-relevant document to cause a large rank demotion or promotion. Our experiments show that a small number of tokens can already result in a large change in the rank of a document. Moreover, we find that BERT-rankers heavily rely on the document start/head for relevance prediction, making the initial part of the document more susceptible to adversarial attacks. More interestingly, we find a small set of recurring adversarial words that when added to documents result in successful rank demotion/promotion of any relevant/non-relevant document respectively. Finally, our adversarial tokens also show particular topic preferences within and across datasets, exposing potential biases from BERT pre-training or downstream datasets.

[1]  Shi Feng,et al.  Concealed Data Poisoning Attacks on NLP Models , 2021, NAACL.

[2]  Liqun Chen,et al.  Contextualized Perturbation for Textual Adversarial Attack , 2020, NAACL.

[3]  Manisha Verma,et al.  One word at a time: adversarial attacks on retrieval models , 2020, ArXiv.

[4]  Qingfeng Du,et al.  TextTricker: Loss-based and gradient-based adversarial attacks on text classification models , 2020, Eng. Appl. Artif. Intell..

[5]  Aminul Huq,et al.  Adversarial Attacks and Defense on Texts: A Survey , 2020, ArXiv.

[6]  Xipeng Qiu,et al.  BERT-ATTACK: Adversarial Attack against BERT Using BERT , 2020, EMNLP.

[7]  Avishek Anand,et al.  Model agnostic interpretability of rankers via intent modelling , 2020, FAT*.

[8]  Quan Z. Sheng,et al.  Adversarial Attacks on Deep Learning Models in Natural Language Processing: A Survey , 2019 .

[9]  Sameer Singh,et al.  Universal Adversarial Triggers for Attacking and Analyzing NLP , 2019, EMNLP.

[10]  Peter Szolovits,et al.  Is BERT Really Robust? Natural Language Attack on Text Classification and Entailment , 2019, ArXiv.

[11]  Wanxiang Che,et al.  Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency , 2019, ACL.

[12]  Eric Wallace,et al.  Trick Me If You Can: Human-in-the-Loop Generation of Adversarial Examples for Question Answering , 2018, TACL.

[13]  Moshe Tennenholtz,et al.  Ranking Robustness Under Adversarial Document Manipulations , 2018, SIGIR.

[14]  Dejing Dou,et al.  HotFlip: White-Box Adversarial Examples for Text Classification , 2017, ACL.

[15]  Dan Boneh,et al.  Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[16]  Samy Bengio,et al.  Adversarial Machine Learning at Scale , 2016, ICLR.

[17]  Andrew M. Dai,et al.  Adversarial Training Methods for Semi-Supervised Text Classification , 2016, ICLR.